Data Archival and Management (Part 2)¶
Using JSON¶
In [23]:
import json
import requests
import pandas as pd
JSON and REST API¶
Most websites that allow access to their data will provide a Representationl State Transfer (REST) Applicaiton Programming INterface (API). This allows you to retrive sturcrured information by constructing standard HTTP queries. The information returned is in JavaScript Object Notation (JSON). For Python users, JSON is almost exactly like a set of nested dictionareies.
We will use the Pokemon API to illustrate REST API and working with JSON.
Using HTTP get
requests¶
In [74]:
request = requests.get('http://pokeapi.co/api/v2/machine?limit=5')
HTTP Status codes¶
A 200 code indicates success.
To understand other codes, see Wikipiedia
In [75]:
request.status_code
Out[75]:
200
Retrieving the JSON object¶
In [76]:
data = request.json()
In [78]:
data['results']
Out[78]:
[{'url': 'https://pokeapi.co/api/v2/machine/1/'},
{'url': 'https://pokeapi.co/api/v2/machine/2/'},
{'url': 'https://pokeapi.co/api/v2/machine/3/'},
{'url': 'https://pokeapi.co/api/v2/machine/4/'},
{'url': 'https://pokeapi.co/api/v2/machine/5/'}]
In [81]:
machines = [requests.get(result['url']).json()
for result in data['results']]
In [123]:
machines[0]
Out[123]:
{'id': 1,
'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
'move': {'name': 'mega-punch', 'url': 'https://pokeapi.co/api/v2/move/5/'},
'version_group': {'name': 'red-blue',
'url': 'https://pokeapi.co/api/v2/version-group/1/'}}
Saving to JSON file¶
In [129]:
with open('machines.json', 'w') as f:
json.dump(machines, f)
In [130]:
! cat machines.json
[{"id": 1, "version_group": {"name": "red-blue", "url": "https://pokeapi.co/api/v2/version-group/1/"}, "move": {"name": "mega-punch", "url": "https://pokeapi.co/api/v2/move/5/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}, {"id": 2, "version_group": {"name": "yellow", "url": "https://pokeapi.co/api/v2/version-group/2/"}, "move": {"name": "mega-punch", "url": "https://pokeapi.co/api/v2/move/5/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}, {"id": 3, "version_group": {"name": "gold-silver", "url": "https://pokeapi.co/api/v2/version-group/3/"}, "move": {"name": "dynamic-punch", "url": "https://pokeapi.co/api/v2/move/223/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}, {"id": 4, "version_group": {"name": "crystal", "url": "https://pokeapi.co/api/v2/version-group/4/"}, "move": {"name": "dynamic-punch", "url": "https://pokeapi.co/api/v2/move/223/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}, {"id": 5, "version_group": {"name": "ruby-sapphire", "url": "https://pokeapi.co/api/v2/version-group/5/"}, "move": {"name": "focus-punch", "url": "https://pokeapi.co/api/v2/move/264/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}]
Reading from JSON file¶
In [134]:
with open('machines.json') as f:
data = json.load(f)
data
Out[134]:
[{'id': 1,
'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
'move': {'name': 'mega-punch', 'url': 'https://pokeapi.co/api/v2/move/5/'},
'version_group': {'name': 'red-blue',
'url': 'https://pokeapi.co/api/v2/version-group/1/'}},
{'id': 2,
'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
'move': {'name': 'mega-punch', 'url': 'https://pokeapi.co/api/v2/move/5/'},
'version_group': {'name': 'yellow',
'url': 'https://pokeapi.co/api/v2/version-group/2/'}},
{'id': 3,
'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
'move': {'name': 'dynamic-punch',
'url': 'https://pokeapi.co/api/v2/move/223/'},
'version_group': {'name': 'gold-silver',
'url': 'https://pokeapi.co/api/v2/version-group/3/'}},
{'id': 4,
'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
'move': {'name': 'dynamic-punch',
'url': 'https://pokeapi.co/api/v2/move/223/'},
'version_group': {'name': 'crystal',
'url': 'https://pokeapi.co/api/v2/version-group/4/'}},
{'id': 5,
'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
'move': {'name': 'focus-punch',
'url': 'https://pokeapi.co/api/v2/move/264/'},
'version_group': {'name': 'ruby-sapphire',
'url': 'https://pokeapi.co/api/v2/version-group/5/'}}]
Loading into pandas
¶
Direct loading of a JSON file¶
This does not “unpack” the nested arrays.
In [137]:
pd.read_json('machines.json')
Out[137]:
id | item | move | version_group | |
---|---|---|---|---|
0 | 1 | {'name': 'tm01', 'url': 'https://pokeapi.co/ap... | {'name': 'mega-punch', 'url': 'https://pokeapi... | {'name': 'red-blue', 'url': 'https://pokeapi.c... |
1 | 2 | {'name': 'tm01', 'url': 'https://pokeapi.co/ap... | {'name': 'mega-punch', 'url': 'https://pokeapi... | {'name': 'yellow', 'url': 'https://pokeapi.co/... |
2 | 3 | {'name': 'tm01', 'url': 'https://pokeapi.co/ap... | {'name': 'dynamic-punch', 'url': 'https://poke... | {'name': 'gold-silver', 'url': 'https://pokeap... |
3 | 4 | {'name': 'tm01', 'url': 'https://pokeapi.co/ap... | {'name': 'dynamic-punch', 'url': 'https://poke... | {'name': 'crystal', 'url': 'https://pokeapi.co... |
4 | 5 | {'name': 'tm01', 'url': 'https://pokeapi.co/ap... | {'name': 'focus-punch', 'url': 'https://pokeap... | {'name': 'ruby-sapphire', 'url': 'https://poke... |
Use the json_normalize
utility to unpack nested structures¶
In [138]:
from pandas.io.json import json_normalize
In [140]:
json_normalize(json.load(open('machines.json')))
Out[140]:
id | item.name | item.url | move.name | move.url | version_group.name | version_group.url | |
---|---|---|---|---|---|---|---|
0 | 1 | tm01 | https://pokeapi.co/api/v2/item/305/ | mega-punch | https://pokeapi.co/api/v2/move/5/ | red-blue | https://pokeapi.co/api/v2/version-group/1/ |
1 | 2 | tm01 | https://pokeapi.co/api/v2/item/305/ | mega-punch | https://pokeapi.co/api/v2/move/5/ | yellow | https://pokeapi.co/api/v2/version-group/2/ |
2 | 3 | tm01 | https://pokeapi.co/api/v2/item/305/ | dynamic-punch | https://pokeapi.co/api/v2/move/223/ | gold-silver | https://pokeapi.co/api/v2/version-group/3/ |
3 | 4 | tm01 | https://pokeapi.co/api/v2/item/305/ | dynamic-punch | https://pokeapi.co/api/v2/move/223/ | crystal | https://pokeapi.co/api/v2/version-group/4/ |
4 | 5 | tm01 | https://pokeapi.co/api/v2/item/305/ | focus-punch | https://pokeapi.co/api/v2/move/264/ | ruby-sapphire | https://pokeapi.co/api/v2/version-group/5/ |
A more compicated JSON object¶
Example of working with nested JSON objects and coversion to DataFrame.
In [85]:
request = requests.get('http://pokeapi.co/api/v2/pokemon/1')
In [87]:
request.status_code
Out[87]:
200
In [88]:
data = request.json()
In [89]:
data.keys()
Out[89]:
dict_keys(['is_default', 'order', 'held_items', 'abilities', 'stats', 'base_experience', 'species', 'sprites', 'id', 'name', 'weight', 'forms', 'location_area_encounters', 'types', 'game_indices', 'height', 'moves'])
In [90]:
data['id']
Out[90]:
1
In [91]:
data['name']
Out[91]:
'bulbasaur'
In [94]:
data['abilities']
Out[94]:
[{'ability': {'name': 'chlorophyll',
'url': 'https://pokeapi.co/api/v2/ability/34/'},
'is_hidden': True,
'slot': 3},
{'ability': {'name': 'overgrow',
'url': 'https://pokeapi.co/api/v2/ability/65/'},
'is_hidden': False,
'slot': 1}]
Initial coversion to DataFrame¶
In [96]:
bulbasaur = json_normalize(data, ['abilities'], ['id', 'name'])
In [97]:
bulbasaur
Out[97]:
ability | is_hidden | slot | name | id | |
---|---|---|---|---|---|
0 | {'name': 'chlorophyll', 'url': 'https://pokeap... | True | 3 | bulbasaur | 1 |
1 | {'name': 'overgrow', 'url': 'https://pokeapi.c... | False | 1 | bulbasaur | 1 |
Parsing dictionary in ability to seeparae columns¶
In [119]:
(
bulbasaur['ability'].
apply(pd.Series)
)
Out[119]:
name | url | |
---|---|---|
0 | chlorophyll | https://pokeapi.co/api/v2/ability/34/ |
1 | overgrow | https://pokeapi.co/api/v2/ability/65/ |
Renaeme abiliyt name to avoid with pokemon name¶
In [109]:
(
bulbasaur['ability'].
apply(pd.Series).
rename_axis({'name': 'ability_name'}, axis=1)
)
Out[109]:
ability_name | url | |
---|---|---|
0 | chlorophyll | https://pokeapi.co/api/v2/ability/34/ |
1 | overgrow | https://pokeapi.co/api/v2/ability/65/ |
Join with origanl DataFrame and rearrange column order¶
In [120]:
(
bulbasaur['ability'].
apply(pd.Series).
rename_axis({'name': 'ability_name'}, axis=1).
join(bulbasaur.drop('ability', axis=1)).
filter(['id', 'name', 'slot', 'is_hidden', 'ability_name', 'url'])
)
Out[120]:
id | name | slot | is_hidden | ability_name | url | |
---|---|---|---|---|---|---|
0 | 1 | bulbasaur | 3 | True | chlorophyll | https://pokeapi.co/api/v2/ability/34/ |
1 | 1 | bulbasaur | 1 | False | overgrow | https://pokeapi.co/api/v2/ability/65/ |
In [ ]: