Further ReadingΒΆ
Much of the material for learning Python for statistics and data science can be found online, but some of us still enjoy reading books ... For those of us diehard bibliophiles, these are books that I referred to for the course and enjoyed reading. They are lister in roughly the same order as the course lectures.
Python in a Nutshell by Steve Holden, Anna Ravenscroft and Alex Martelli (3rd edition)
A really nice reference for Python 3.
Python Cookbook: Recipes for Mastering Python 3 by David Beazley, Brian K. Jones (3rd Edition)
When you want are stuck on a specific task and Stack Overflow is not working for you.
Learning IPython for Interactive Computing and Data Visualization by Cyrille Rossant (2nd edition)
If you want to master Jupyter.
Fluent Python by Luciano Ramalho
Awesome resource for learning how to code in idiomatic Python like a Pythonista.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney
Pandas by the developer. A little dated but you can supplement with the online material.
Python Data Science Handbook by Jake VanderPlas
Still a work in progress, but it looks like the single best book for this course.
-
Examples of how to use PyMC3.
High Performance Python by Micha Gorelick and Ian Ozsvald
Make your Python code faster.
Cython: A Guide for Python Programmers by Kurt W Smith
If you want to master Cython, this book is your guide.
21st Century C: C Tips from the New School by Ben Klemens (2nd edition)
Modern C for statisticians.
Guide to Scientific Computing in C++ by Joe Pitt-Francis and Jonathan Whiteley
Fairly gentle introduction with a section on linear algebra and implementation of conjugate gradient in C++. Much more focus on object-oriented programming than in this course. You can get the e-book free via Duke.
Discovering Modern C++: An Intensive Course for Scientists, Engineers, and Programmers by Peter Gottschling
Awesome introduction to modern C++ (C++11 and C++14) for numerical work. Possibly too dense if you don’t already have some familiarity with C/C++.
Managing Projects with GNU Make by Robert Mecklenburg (3rd edition)
Guide to using
make
. Free.Learning Spark: Lightning-Fast Big Data Analysis by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia (unfortunately Spark books tend to be outdated the moment they are printed - this edition covers Spark 1.3 and we are already at Spark 1.6)
Introduction to Spark with Java, Scala and Python examples.
Data Algorithms: Recipes for Scaling Up with Hadoop and Spark by Mahmoud Parsian
Sort of a cookbook with examples in Hadoop and Spark. Emphasis on biomedical applications.
Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data by Kyran Dale
Still in early stages, but looks very promising. If I ever include lectures on data visualization, I suspect this book will be my reference.
In [ ]: