Data Archival and Management (Part 2)

Using JSON

In [23]:
import json
import requests
import pandas as pd

JSON and REST API

Most websites that allow access to their data will provide a Representationl State Transfer (REST) Applicaiton Programming INterface (API). This allows you to retrive sturcrured information by constructing standard HTTP queries. The information returned is in JavaScript Object Notation (JSON). For Python users, JSON is almost exactly like a set of nested dictionareies.

We will use the Pokemon API to illustrate REST API and working with JSON.

Using HTTP get requests

In [74]:
request = requests.get('http://pokeapi.co/api/v2/machine?limit=5')

HTTP Status codes

A 200 code indicates success.

To understand other codes, see Wikipiedia

In [75]:
request.status_code
Out[75]:
200

Retrieving the JSON object

In [76]:
data = request.json()
In [78]:
data['results']
Out[78]:
[{'url': 'https://pokeapi.co/api/v2/machine/1/'},
 {'url': 'https://pokeapi.co/api/v2/machine/2/'},
 {'url': 'https://pokeapi.co/api/v2/machine/3/'},
 {'url': 'https://pokeapi.co/api/v2/machine/4/'},
 {'url': 'https://pokeapi.co/api/v2/machine/5/'}]
In [81]:
machines = [requests.get(result['url']).json()
            for result in data['results']]
In [123]:
machines[0]
Out[123]:
{'id': 1,
 'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
 'move': {'name': 'mega-punch', 'url': 'https://pokeapi.co/api/v2/move/5/'},
 'version_group': {'name': 'red-blue',
  'url': 'https://pokeapi.co/api/v2/version-group/1/'}}

Saving to JSON file

In [129]:
with open('machines.json', 'w') as f:
    json.dump(machines, f)
In [130]:
! cat machines.json
[{"id": 1, "version_group": {"name": "red-blue", "url": "https://pokeapi.co/api/v2/version-group/1/"}, "move": {"name": "mega-punch", "url": "https://pokeapi.co/api/v2/move/5/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}, {"id": 2, "version_group": {"name": "yellow", "url": "https://pokeapi.co/api/v2/version-group/2/"}, "move": {"name": "mega-punch", "url": "https://pokeapi.co/api/v2/move/5/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}, {"id": 3, "version_group": {"name": "gold-silver", "url": "https://pokeapi.co/api/v2/version-group/3/"}, "move": {"name": "dynamic-punch", "url": "https://pokeapi.co/api/v2/move/223/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}, {"id": 4, "version_group": {"name": "crystal", "url": "https://pokeapi.co/api/v2/version-group/4/"}, "move": {"name": "dynamic-punch", "url": "https://pokeapi.co/api/v2/move/223/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}, {"id": 5, "version_group": {"name": "ruby-sapphire", "url": "https://pokeapi.co/api/v2/version-group/5/"}, "move": {"name": "focus-punch", "url": "https://pokeapi.co/api/v2/move/264/"}, "item": {"name": "tm01", "url": "https://pokeapi.co/api/v2/item/305/"}}]

Reading from JSON file

In [134]:
with open('machines.json') as f:
    data = json.load(f)
data
Out[134]:
[{'id': 1,
  'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
  'move': {'name': 'mega-punch', 'url': 'https://pokeapi.co/api/v2/move/5/'},
  'version_group': {'name': 'red-blue',
   'url': 'https://pokeapi.co/api/v2/version-group/1/'}},
 {'id': 2,
  'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
  'move': {'name': 'mega-punch', 'url': 'https://pokeapi.co/api/v2/move/5/'},
  'version_group': {'name': 'yellow',
   'url': 'https://pokeapi.co/api/v2/version-group/2/'}},
 {'id': 3,
  'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
  'move': {'name': 'dynamic-punch',
   'url': 'https://pokeapi.co/api/v2/move/223/'},
  'version_group': {'name': 'gold-silver',
   'url': 'https://pokeapi.co/api/v2/version-group/3/'}},
 {'id': 4,
  'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
  'move': {'name': 'dynamic-punch',
   'url': 'https://pokeapi.co/api/v2/move/223/'},
  'version_group': {'name': 'crystal',
   'url': 'https://pokeapi.co/api/v2/version-group/4/'}},
 {'id': 5,
  'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},
  'move': {'name': 'focus-punch',
   'url': 'https://pokeapi.co/api/v2/move/264/'},
  'version_group': {'name': 'ruby-sapphire',
   'url': 'https://pokeapi.co/api/v2/version-group/5/'}}]

Loading into pandas

Direct loading of a JSON file

This does not “unpack” the nested arrays.

In [137]:
pd.read_json('machines.json')
Out[137]:
id item move version_group
0 1 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... {'name': 'mega-punch', 'url': 'https://pokeapi... {'name': 'red-blue', 'url': 'https://pokeapi.c...
1 2 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... {'name': 'mega-punch', 'url': 'https://pokeapi... {'name': 'yellow', 'url': 'https://pokeapi.co/...
2 3 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... {'name': 'dynamic-punch', 'url': 'https://poke... {'name': 'gold-silver', 'url': 'https://pokeap...
3 4 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... {'name': 'dynamic-punch', 'url': 'https://poke... {'name': 'crystal', 'url': 'https://pokeapi.co...
4 5 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... {'name': 'focus-punch', 'url': 'https://pokeap... {'name': 'ruby-sapphire', 'url': 'https://poke...
Use the json_normalize utility to unpack nested structures
In [138]:
from pandas.io.json import json_normalize
In [140]:
json_normalize(json.load(open('machines.json')))
Out[140]:
id item.name item.url move.name move.url version_group.name version_group.url
0 1 tm01 https://pokeapi.co/api/v2/item/305/ mega-punch https://pokeapi.co/api/v2/move/5/ red-blue https://pokeapi.co/api/v2/version-group/1/
1 2 tm01 https://pokeapi.co/api/v2/item/305/ mega-punch https://pokeapi.co/api/v2/move/5/ yellow https://pokeapi.co/api/v2/version-group/2/
2 3 tm01 https://pokeapi.co/api/v2/item/305/ dynamic-punch https://pokeapi.co/api/v2/move/223/ gold-silver https://pokeapi.co/api/v2/version-group/3/
3 4 tm01 https://pokeapi.co/api/v2/item/305/ dynamic-punch https://pokeapi.co/api/v2/move/223/ crystal https://pokeapi.co/api/v2/version-group/4/
4 5 tm01 https://pokeapi.co/api/v2/item/305/ focus-punch https://pokeapi.co/api/v2/move/264/ ruby-sapphire https://pokeapi.co/api/v2/version-group/5/

A more compicated JSON object

Example of working with nested JSON objects and coversion to DataFrame.

In [85]:
request = requests.get('http://pokeapi.co/api/v2/pokemon/1')
In [87]:
request.status_code
Out[87]:
200
In [88]:
data = request.json()
In [89]:
data.keys()
Out[89]:
dict_keys(['is_default', 'order', 'held_items', 'abilities', 'stats', 'base_experience', 'species', 'sprites', 'id', 'name', 'weight', 'forms', 'location_area_encounters', 'types', 'game_indices', 'height', 'moves'])
In [90]:
data['id']
Out[90]:
1
In [91]:
data['name']
Out[91]:
'bulbasaur'
In [94]:
data['abilities']
Out[94]:
[{'ability': {'name': 'chlorophyll',
   'url': 'https://pokeapi.co/api/v2/ability/34/'},
  'is_hidden': True,
  'slot': 3},
 {'ability': {'name': 'overgrow',
   'url': 'https://pokeapi.co/api/v2/ability/65/'},
  'is_hidden': False,
  'slot': 1}]

Initial coversion to DataFrame

In [96]:
bulbasaur = json_normalize(data, ['abilities'], ['id', 'name'])
In [97]:
bulbasaur
Out[97]:
ability is_hidden slot name id
0 {'name': 'chlorophyll', 'url': 'https://pokeap... True 3 bulbasaur 1
1 {'name': 'overgrow', 'url': 'https://pokeapi.c... False 1 bulbasaur 1

Parsing dictionary in ability to seeparae columns

In [119]:
(
    bulbasaur['ability'].
    apply(pd.Series)
)
Out[119]:
name url
0 chlorophyll https://pokeapi.co/api/v2/ability/34/
1 overgrow https://pokeapi.co/api/v2/ability/65/

Renaeme abiliyt name to avoid with pokemon name

In [109]:
(
    bulbasaur['ability'].
    apply(pd.Series).
    rename_axis({'name': 'ability_name'}, axis=1)
)
Out[109]:
ability_name url
0 chlorophyll https://pokeapi.co/api/v2/ability/34/
1 overgrow https://pokeapi.co/api/v2/ability/65/

Join with origanl DataFrame and rearrange column order

In [120]:
(
    bulbasaur['ability'].
    apply(pd.Series).
    rename_axis({'name': 'ability_name'}, axis=1).
    join(bulbasaur.drop('ability', axis=1)).
    filter(['id', 'name', 'slot', 'is_hidden', 'ability_name', 'url'])
)
Out[120]:
id name slot is_hidden ability_name url
0 1 bulbasaur 3 True chlorophyll https://pokeapi.co/api/v2/ability/34/
1 1 bulbasaur 1 False overgrow https://pokeapi.co/api/v2/ability/65/
In [ ]: