{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Archival and Management (Part 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using JSON" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import json\n", "import requests\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### JSON and REST API\n", "\n", "Most websites that allow access to their data will provide a Representationl State Transfer (REST) Applicaiton Programming INterface (API). This allows you to retrive sturcrured information by constructing standard HTTP queries. The information returned is in JavaScript Object Notation (JSON). For Python users, JSON is almost exactly like a set of nested dictionareies.\n", "\n", "We will use the [Pokemon API](https://pokeapi.co/docsv2/) to illustrate REST API and working with JSON." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using HTTP `get` requests" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": true }, "outputs": [], "source": [ "request = requests.get('http://pokeapi.co/api/v2/machine?limit=5')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### HTTP Status codes\n", "\n", "A 200 code indicates success.\n", "\n", "To understand other codes, see [Wikipiedia](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "200" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "request.status_code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Retrieving the JSON object" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data = request.json()" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'url': 'https://pokeapi.co/api/v2/machine/1/'},\n", " {'url': 'https://pokeapi.co/api/v2/machine/2/'},\n", " {'url': 'https://pokeapi.co/api/v2/machine/3/'},\n", " {'url': 'https://pokeapi.co/api/v2/machine/4/'},\n", " {'url': 'https://pokeapi.co/api/v2/machine/5/'}]" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['results']" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [], "source": [ "machines = [requests.get(result['url']).json() \n", " for result in data['results']]" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': 1,\n", " 'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},\n", " 'move': {'name': 'mega-punch', 'url': 'https://pokeapi.co/api/v2/move/5/'},\n", " 'version_group': {'name': 'red-blue',\n", " 'url': 'https://pokeapi.co/api/v2/version-group/1/'}}" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "machines[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Saving to JSON file" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [], "source": [ "with open('machines.json', 'w') as f:\n", " json.dump(machines, f)" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{\"id\": 1, \"version_group\": {\"name\": \"red-blue\", \"url\": \"https://pokeapi.co/api/v2/version-group/1/\"}, \"move\": {\"name\": \"mega-punch\", \"url\": \"https://pokeapi.co/api/v2/move/5/\"}, \"item\": {\"name\": \"tm01\", \"url\": \"https://pokeapi.co/api/v2/item/305/\"}}, {\"id\": 2, \"version_group\": {\"name\": \"yellow\", \"url\": \"https://pokeapi.co/api/v2/version-group/2/\"}, \"move\": {\"name\": \"mega-punch\", \"url\": \"https://pokeapi.co/api/v2/move/5/\"}, \"item\": {\"name\": \"tm01\", \"url\": \"https://pokeapi.co/api/v2/item/305/\"}}, {\"id\": 3, \"version_group\": {\"name\": \"gold-silver\", \"url\": \"https://pokeapi.co/api/v2/version-group/3/\"}, \"move\": {\"name\": \"dynamic-punch\", \"url\": \"https://pokeapi.co/api/v2/move/223/\"}, \"item\": {\"name\": \"tm01\", \"url\": \"https://pokeapi.co/api/v2/item/305/\"}}, {\"id\": 4, \"version_group\": {\"name\": \"crystal\", \"url\": \"https://pokeapi.co/api/v2/version-group/4/\"}, \"move\": {\"name\": \"dynamic-punch\", \"url\": \"https://pokeapi.co/api/v2/move/223/\"}, \"item\": {\"name\": \"tm01\", \"url\": \"https://pokeapi.co/api/v2/item/305/\"}}, {\"id\": 5, \"version_group\": {\"name\": \"ruby-sapphire\", \"url\": \"https://pokeapi.co/api/v2/version-group/5/\"}, \"move\": {\"name\": \"focus-punch\", \"url\": \"https://pokeapi.co/api/v2/move/264/\"}, \"item\": {\"name\": \"tm01\", \"url\": \"https://pokeapi.co/api/v2/item/305/\"}}]" ] } ], "source": [ "! cat machines.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Reading from JSON file" ] }, { "cell_type": "code", "execution_count": 134, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 1,\n", " 'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},\n", " 'move': {'name': 'mega-punch', 'url': 'https://pokeapi.co/api/v2/move/5/'},\n", " 'version_group': {'name': 'red-blue',\n", " 'url': 'https://pokeapi.co/api/v2/version-group/1/'}},\n", " {'id': 2,\n", " 'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},\n", " 'move': {'name': 'mega-punch', 'url': 'https://pokeapi.co/api/v2/move/5/'},\n", " 'version_group': {'name': 'yellow',\n", " 'url': 'https://pokeapi.co/api/v2/version-group/2/'}},\n", " {'id': 3,\n", " 'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},\n", " 'move': {'name': 'dynamic-punch',\n", " 'url': 'https://pokeapi.co/api/v2/move/223/'},\n", " 'version_group': {'name': 'gold-silver',\n", " 'url': 'https://pokeapi.co/api/v2/version-group/3/'}},\n", " {'id': 4,\n", " 'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},\n", " 'move': {'name': 'dynamic-punch',\n", " 'url': 'https://pokeapi.co/api/v2/move/223/'},\n", " 'version_group': {'name': 'crystal',\n", " 'url': 'https://pokeapi.co/api/v2/version-group/4/'}},\n", " {'id': 5,\n", " 'item': {'name': 'tm01', 'url': 'https://pokeapi.co/api/v2/item/305/'},\n", " 'move': {'name': 'focus-punch',\n", " 'url': 'https://pokeapi.co/api/v2/move/264/'},\n", " 'version_group': {'name': 'ruby-sapphire',\n", " 'url': 'https://pokeapi.co/api/v2/version-group/5/'}}]" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open('machines.json') as f:\n", " data = json.load(f)\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Loading into `pandas`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Direct loading of a JSON file \n", "\n", "This does not \"unpack\" the nested arrays." ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iditemmoveversion_group
01{'name': 'tm01', 'url': 'https://pokeapi.co/ap...{'name': 'mega-punch', 'url': 'https://pokeapi...{'name': 'red-blue', 'url': 'https://pokeapi.c...
12{'name': 'tm01', 'url': 'https://pokeapi.co/ap...{'name': 'mega-punch', 'url': 'https://pokeapi...{'name': 'yellow', 'url': 'https://pokeapi.co/...
23{'name': 'tm01', 'url': 'https://pokeapi.co/ap...{'name': 'dynamic-punch', 'url': 'https://poke...{'name': 'gold-silver', 'url': 'https://pokeap...
34{'name': 'tm01', 'url': 'https://pokeapi.co/ap...{'name': 'dynamic-punch', 'url': 'https://poke...{'name': 'crystal', 'url': 'https://pokeapi.co...
45{'name': 'tm01', 'url': 'https://pokeapi.co/ap...{'name': 'focus-punch', 'url': 'https://pokeap...{'name': 'ruby-sapphire', 'url': 'https://poke...
\n", "
" ], "text/plain": [ " id item \\\n", "0 1 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... \n", "1 2 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... \n", "2 3 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... \n", "3 4 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... \n", "4 5 {'name': 'tm01', 'url': 'https://pokeapi.co/ap... \n", "\n", " move \\\n", "0 {'name': 'mega-punch', 'url': 'https://pokeapi... \n", "1 {'name': 'mega-punch', 'url': 'https://pokeapi... \n", "2 {'name': 'dynamic-punch', 'url': 'https://poke... \n", "3 {'name': 'dynamic-punch', 'url': 'https://poke... \n", "4 {'name': 'focus-punch', 'url': 'https://pokeap... \n", "\n", " version_group \n", "0 {'name': 'red-blue', 'url': 'https://pokeapi.c... \n", "1 {'name': 'yellow', 'url': 'https://pokeapi.co/... \n", "2 {'name': 'gold-silver', 'url': 'https://pokeap... \n", "3 {'name': 'crystal', 'url': 'https://pokeapi.co... \n", "4 {'name': 'ruby-sapphire', 'url': 'https://poke... " ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_json('machines.json')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Use the `json_normalize` utility to unpack nested structures" ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pandas.io.json import json_normalize" ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iditem.nameitem.urlmove.namemove.urlversion_group.nameversion_group.url
01tm01https://pokeapi.co/api/v2/item/305/mega-punchhttps://pokeapi.co/api/v2/move/5/red-bluehttps://pokeapi.co/api/v2/version-group/1/
12tm01https://pokeapi.co/api/v2/item/305/mega-punchhttps://pokeapi.co/api/v2/move/5/yellowhttps://pokeapi.co/api/v2/version-group/2/
23tm01https://pokeapi.co/api/v2/item/305/dynamic-punchhttps://pokeapi.co/api/v2/move/223/gold-silverhttps://pokeapi.co/api/v2/version-group/3/
34tm01https://pokeapi.co/api/v2/item/305/dynamic-punchhttps://pokeapi.co/api/v2/move/223/crystalhttps://pokeapi.co/api/v2/version-group/4/
45tm01https://pokeapi.co/api/v2/item/305/focus-punchhttps://pokeapi.co/api/v2/move/264/ruby-sapphirehttps://pokeapi.co/api/v2/version-group/5/
\n", "
" ], "text/plain": [ " id item.name item.url move.name \\\n", "0 1 tm01 https://pokeapi.co/api/v2/item/305/ mega-punch \n", "1 2 tm01 https://pokeapi.co/api/v2/item/305/ mega-punch \n", "2 3 tm01 https://pokeapi.co/api/v2/item/305/ dynamic-punch \n", "3 4 tm01 https://pokeapi.co/api/v2/item/305/ dynamic-punch \n", "4 5 tm01 https://pokeapi.co/api/v2/item/305/ focus-punch \n", "\n", " move.url version_group.name \\\n", "0 https://pokeapi.co/api/v2/move/5/ red-blue \n", "1 https://pokeapi.co/api/v2/move/5/ yellow \n", "2 https://pokeapi.co/api/v2/move/223/ gold-silver \n", "3 https://pokeapi.co/api/v2/move/223/ crystal \n", "4 https://pokeapi.co/api/v2/move/264/ ruby-sapphire \n", "\n", " version_group.url \n", "0 https://pokeapi.co/api/v2/version-group/1/ \n", "1 https://pokeapi.co/api/v2/version-group/2/ \n", "2 https://pokeapi.co/api/v2/version-group/3/ \n", "3 https://pokeapi.co/api/v2/version-group/4/ \n", "4 https://pokeapi.co/api/v2/version-group/5/ " ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_normalize(json.load(open('machines.json')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A more compicated JSON object\n", "\n", "Example of working with nested JSON objects and coversion to DataFrame." ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "request = requests.get('http://pokeapi.co/api/v2/pokemon/1')" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "200" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "request.status_code" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data = request.json()" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['is_default', 'order', 'held_items', 'abilities', 'stats', 'base_experience', 'species', 'sprites', 'id', 'name', 'weight', 'forms', 'location_area_encounters', 'types', 'game_indices', 'height', 'moves'])" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.keys()" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['id']" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bulbasaur'" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['name']" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "[{'ability': {'name': 'chlorophyll',\n", " 'url': 'https://pokeapi.co/api/v2/ability/34/'},\n", " 'is_hidden': True,\n", " 'slot': 3},\n", " {'ability': {'name': 'overgrow',\n", " 'url': 'https://pokeapi.co/api/v2/ability/65/'},\n", " 'is_hidden': False,\n", " 'slot': 1}]" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['abilities']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Initial coversion to DataFrame" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [], "source": [ "bulbasaur = json_normalize(data, ['abilities'], ['id', 'name'])" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abilityis_hiddenslotnameid
0{'name': 'chlorophyll', 'url': 'https://pokeap...True3bulbasaur1
1{'name': 'overgrow', 'url': 'https://pokeapi.c...False1bulbasaur1
\n", "
" ], "text/plain": [ " ability is_hidden slot \\\n", "0 {'name': 'chlorophyll', 'url': 'https://pokeap... True 3 \n", "1 {'name': 'overgrow', 'url': 'https://pokeapi.c... False 1 \n", "\n", " name id \n", "0 bulbasaur 1 \n", "1 bulbasaur 1 " ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bulbasaur" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Parsing dictionary in ability to seeparae columns" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameurl
0chlorophyllhttps://pokeapi.co/api/v2/ability/34/
1overgrowhttps://pokeapi.co/api/v2/ability/65/
\n", "
" ], "text/plain": [ " name url\n", "0 chlorophyll https://pokeapi.co/api/v2/ability/34/\n", "1 overgrow https://pokeapi.co/api/v2/ability/65/" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", " bulbasaur['ability'].\n", " apply(pd.Series)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Renaeme abiliyt name to avoid with pokemon name" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ability_nameurl
0chlorophyllhttps://pokeapi.co/api/v2/ability/34/
1overgrowhttps://pokeapi.co/api/v2/ability/65/
\n", "
" ], "text/plain": [ " ability_name url\n", "0 chlorophyll https://pokeapi.co/api/v2/ability/34/\n", "1 overgrow https://pokeapi.co/api/v2/ability/65/" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", " bulbasaur['ability'].\n", " apply(pd.Series).\n", " rename_axis({'name': 'ability_name'}, axis=1)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Join with origanl DataFrame and rearrange column order" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameslotis_hiddenability_nameurl
01bulbasaur3Truechlorophyllhttps://pokeapi.co/api/v2/ability/34/
11bulbasaur1Falseovergrowhttps://pokeapi.co/api/v2/ability/65/
\n", "
" ], "text/plain": [ " id name slot is_hidden ability_name \\\n", "0 1 bulbasaur 3 True chlorophyll \n", "1 1 bulbasaur 1 False overgrow \n", "\n", " url \n", "0 https://pokeapi.co/api/v2/ability/34/ \n", "1 https://pokeapi.co/api/v2/ability/65/ " ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", " bulbasaur['ability'].\n", " apply(pd.Series).\n", " rename_axis({'name': 'ability_name'}, axis=1).\n", " join(bulbasaur.drop('ability', axis=1)).\n", " filter(['id', 'name', 'slot', 'is_hidden', 'ability_name', 'url'])\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }