Further ReadingΒΆ

Much of the material for learning Python for statistics and data science can be found online, but some of us still enjoy reading books ... For those of us diehard bibliophiles, these are books that I referred to for the course and enjoyed reading. They are lister in roughly the same order as the course lectures.

  • Python in a Nutshell by Steve Holden, Anna Ravenscroft and Alex Martelli (3rd edition)

    A really nice reference for Python 3.

  • Python Cookbook: Recipes for Mastering Python 3 by David Beazley, Brian K. Jones (3rd Edition)

    When you want are stuck on a specific task and Stack Overflow is not working for you.

  • Learning IPython for Interactive Computing and Data Visualization by Cyrille Rossant (2nd edition)

    If you want to master Jupyter.

  • Fluent Python by Luciano Ramalho

    Awesome resource for learning how to code in idiomatic Python like a Pythonista.

  • Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney

    Pandas by the developer. A little dated but you can supplement with the online material.

  • Python Data Science Handbook by Jake VanderPlas

    Still a work in progress, but it looks like the single best book for this course.

  • Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference by Cameron Davidson-Pilon

    Examples of how to use PyMC3.

  • High Performance Python by Micha Gorelick and Ian Ozsvald

    Make your Python code faster.

  • Cython: A Guide for Python Programmers by Kurt W Smith

    If you want to master Cython, this book is your guide.

  • 21st Century C: C Tips from the New School by Ben Klemens (2nd edition)

    Modern C for statisticians.

  • Guide to Scientific Computing in C++ by Joe Pitt-Francis and Jonathan Whiteley

    Fairly gentle introduction with a section on linear algebra and implementation of conjugate gradient in C++. Much more focus on object-oriented programming than in this course. You can get the e-book free via Duke.

  • Discovering Modern C++: An Intensive Course for Scientists, Engineers, and Programmers by Peter Gottschling

    Awesome introduction to modern C++ (C++11 and C++14) for numerical work. Possibly too dense if you don’t already have some familiarity with C/C++.

  • Managing Projects with GNU Make by Robert Mecklenburg (3rd edition)

    Guide to using make. Free.

  • Learning Spark: Lightning-Fast Big Data Analysis by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia (unfortunately Spark books tend to be outdated the moment they are printed - this edition covers Spark 1.3 and we are already at Spark 1.6)

    Introduction to Spark with Java, Scala and Python examples.

  • Data Algorithms: Recipes for Scaling Up with Hadoop and Spark by Mahmoud Parsian

    Sort of a cookbook with examples in Hadoop and Spark. Emphasis on biomedical applications.

  • Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data by Kyran Dale

    Still in early stages, but looks very promising. If I ever include lectures on data visualization, I suspect this book will be my reference.

In [ ]: