While waiting for Pokemon Go to hit my region, I decided to try out the excellent Poke API. I then analyzed the Pokemon stats with Principal Component Analysis (PCA) to see how they relate to each other.
Download ‘em all
To get the data, I accessed the API with R’s
httr package, downloading all the stats and sprites for the original 151 Pokemons. You can also skip this part by downloading the data directly from their github.
Use Principal Component Analysis (PCA) to decompose Pokemon’s stats
Each Pokemon has 6 stats: HP, Attack, Defense, Speed, Special Attack, and Special Defense. To get a glimpse into how the game designer assigns values to these stats, we use PCA to discover the latent factors that summarize them. These latent factors are called principal components.1
To see how much variance in Pokemon’s stats is explained by the principal components, we use the
screeplot, which plots the variances explained on the y-axis and the number of the principal component on the x-axis. In this case, we see that the first two principal components capture about half of the variance.
What do these two principal components represent? To understand their meaning, we project the 6 original stats onto the lower 2-dimensional space spanned by the two principal components. The 6 original stats are shown as the red vectors in the
We see that all stats point to the right side of the the Principal Component 1 (PC1). So we could interpret PC1 as “Overall Strength”–the higher the value of PC1, the higher the value of all 6 stats.
On the other hand, along PC2, we see that HP, Attack, and Defense point to a different direction from that of Speed, Special Attack, and Special Defense. So we could interpret PC2 as some sort of “Brawn over Brain”–the higher the value of PC2, the higher the regular stats and the lower the special stats. (NB: Regular stats, e.g. Attack and Defense affect physical moves, while special stats, e.g. Speciall Attack and Special Defense affect elemental moves.)
As you can see, PC1 is pretty intuitive – one can guess from the outset that Pokemons should vary a lot based on their overall strength. To me what’s interesting is really PC2, which shows that the second most important way that Pokemons vary is along the Regular Stat-vs-Special Stat dimensions (which I call Brawn over Brain). That wasn’t something I expected before the analysis.
So here are the 151 Pokemons plotted on the space spanned by these two PCs. Mewtwo and Margikarp are, unsurprisingly, on the two extremes of the “Overall Strength” PC.
One way to think about PC2 is via the classic dilemma of choosing which Psychic pokemon for your team: Hypno or Alakazam. Looking at where these Pokemons position on the graph, we can see that both have similar overall strength (with Alakazam having slightly more). But along PC2, Alakazam places at the bottom of PC2, i.e. leaning heavily towards special stats at the exense of regular stats. On the other hand, Hypno has very balanced stats. That’s why lots of people choose Hypno as a robust Psychic type rather than the glass cannon that is Alakazam.
The three starter Pokemons are very balanced, with Charmander having a slight edge as it evolves to Charizard. If you squint really hard, it also seems like that strongest Pokemon at the first stage (Squirtle) turns out to be weakest at the final stage (Blatoise), and vice versa for Charmander and Charizard. That’s game balance design at work!
The final plot shows how Arcanine, my favorite Pokemon, rivals the Legendary birds in terms of stats. Indeed, Arcanine was planned to be a Legendary, but got changed before the game came out. Reddit has an entire thread devoted to this “Pokemon conspiracy”.
It’s recommended to normalize the data before running the PCA so that the variance of variables are not affected by the units that they are measured in. In this case it’s not necessary because Pokemon stats are all on the same scale. ↩