Spark magic¶
In [1]:
%load_ext sparkmagic.magics
In [2]:
%manage_spark
Starting Spark application
SparkSession available as 'spark'.
In [11]:
%spark info
Info for running Spark:
Sessions:
Name: s1 Session id: 134 YARN id: application_1522938745830_0059 Kind: pyspark State: idle
Spark UI: http://vcm-2168.oit.duke.edu:8088/proxy/application_1522938745830_0059/
Driver Log: http://vcm-3544.oit.duke.edu:8042/node/containerlogs/container_e19_1522938745830_0059_01_000001/user06021
Session configs:
{'driverMemory': '2048M', 'executorCores': 2, 'proxyUser': 'user06021', 'conf': {'spark.master': 'yarn-client'}}
In [24]:
%%spark -o foo
foo = spark.read.parquet('foo.parquet')
foo.show(4)
+-------+--------+-------+-----+----+---+
| name|semester|subject|score| sex|age|
+-------+--------+-------+-----+----+---+
| bob| fall| stats| 92|male| 19|
| bob| summer| stats| 100|male| 19|
| bob| spring| stats| 100|male| 19|
|charles| spring| stats| 88|male| 22|
+-------+--------+-------+-----+----+---+
only showing top 4 rows
Export data to pandas DataFrame
In [25]:
foo
Out[25]:
name | semester | subject | score | sex | age | |
---|---|---|---|---|---|---|
0 | bob | fall | stats | 92.0 | male | 19 |
1 | bob | summer | stats | 100.0 | male | 19 |
2 | bob | spring | stats | 100.0 | male | 19 |
3 | charles | spring | stats | 88.0 | male | 22 |
4 | charles | fall | bio | 100.0 | male | 22 |
5 | ann | spring | math | 98.0 | female | 23 |
6 | ann | fall | bio | 50.0 | female | 23 |
7 | daivd | NaN | NaN | NaN | male | 23 |