Computational Statistics in Python
0.1
Site
Introduction to Python
Variables
Operators
Iterators
Conditional Statements
Functions
Strings and String Handling
Lists, Tuples, Dictionaries
Classes
Modules
The standard library
Keeping the Anaconda distribution up-to-date
Exercises
Getting started with Python and the IPython notebook
Cells
Code Cells
Magic Commands
Python as Glue
Python <-> R <-> Matlab <-> Octave
More Glue: Julia and Perl
Functions are first class objects
Function argumnents
Call by “object reference”
Binding of default arguments occurs at function
definition
Higher-order functions
Anonymous functions
Pure functions
Recursion
Iterators
Generators
Generators and comprehensions
Utilites - enumerate, zip and the ternary if-else operator
Decorators
The
operator
module
The
functools
module
The
itertools
module
The
toolz
,
fn
and
funcy
modules
Exercises
Data science is OSEMN
Obtaining data
Scrubbing data
Exercises
Working with text
String methods
Splitting and joining strings
The string module
Regular expressions
The NLTK toolkit
Exercises
Preprocessing text data
Example: Counting words in a document
Working with structured data
Using SQLite3
Basic concepts of database normalization
Using HDF5
Interfacing withPandas
Using numpy
References
Example
NDArray
Broadcasting, row, column and matrix operations
Universal functions (Ufuncs)
Generalized ufucns
Random numbers
Linear algebra
Exercises
Using Pandas
Series
DataFrame
Panels
Split-Apply-Combine
Using statsmodels
Using R from IPython
Using Rmagic
Using R from pandas
Computational problems in statistics
Textbook example - is coin fair?
Bayesian approach
Comment
Computer numbers and mathematics
Some examples of numbers behaving badly
Finite representation of numbers
Using arbitrary precision libraries
From numbers to Functions: Stability and conditioning
Exercises
Algorithmic complexity
Profling and benchmarking
Measuring algorithmic complexity
Space complexity
Linear Algebra and Linear Systems
Simultaneous Equations
Linear Independence
Norms and Distance of Vectors
Trace and Determinant of Matrices
Column space, Row space, Rank and Kernel
Matrices as Linear Transformations
Matrix Norms
Special Matrices
Exercises
Linear Algebra and Matrix Decompositions
Large Linear Systems
Example: Netflix Competition (circa 2006-2009)
Matrix Decompositions
Matrix Decompositions for PCA and Least Squares
Singular Value Decomposition
Stabilty and Condition Number
Exercises
Change of Basis
Variance and covariance
Eigendecomposition of the covariance matrix
PCA
Change of basis via PCA
Graphical illustration of change of basis
Dimension reduction via PCA
Using Singular Value Decomposition (SVD) for PCA
Optimization and Non-linear Methods
Example: Maximum Likelihood Estimation (MLE)
Bisection Method
Secant Method
Newton-Rhapson Method
Gauss-Newton
Inverse Quadratic Interpolation
Brent’s Method
Practical Optimizatio Routines
Finding roots
Optimization Primer
Using
scipy.optimize
Gradient deescent
Newton’s method and variants
Constrained optimization
Curve fitting
Finding paraemeters for ODE models
Optimization of graph node placement
Optimization of standard statistical models
Fitting ODEs with the Levenberg–Marquardt algorithm
1D example
2D example
Algorithms for Optimization and Root Finding for Multivariate Problems
Optimizers
Solvers
GLM Estimation and IRLS
Expectation Maximizatio (EM) Algorithm
Jensen’s inequality
Maximum likelihood with complete information
Incomplete information
Gaussian mixture models
Using EM
Vectorized version
Vectorization with Einstein summation notation
Comparison of EM routines
Monte Carlo Methods
Pseudorandom number generators (PRNG)
Monte Carlo swindles (Variance reduction techniques)
Quasi-random numbers
Resampling methods
Resampling
Simulations
Setting the random seed
Sampling with and without replacement
Calculation of Cook’s distance
Permutation resampling
Design of simulation experiments
Example: Simulations to estimate power
Check with R
Estimating the CDF
Estimating the PDF
Kernel density estimation
Multivariate kerndel density estimation
Markov Chain Monte Carlo (MCMC)
Bayesian Data Analysis
Metropolis-Hastings sampler
Gibbs sampler
Slice sampler
Hierarchical models
Using PyMC2
Coin toss
Estimating mean and standard deviation of normal distribution
Estimating parameters of a linear regreession model
Estimating parameters of a logistic model
Using a hierarchcical model
Using PyMC3
Coin toss
Estimating mean and standard deviation of normal distribution
Estimating parameters of a linear regreession model
Estimating parameters of a logistic model
Using a hierarchcical model
Using PyStan
References
Simple Logistic model
Animations of Metropolis, Gibbs and Slice Sampler dynamics
C Crash Course
Hello world
A tutorial example - coding a Fibonacci function in C
Types in C
Operators
Control of program flow
Arrays and pointers
Functions
Function pointers
Using make to compile C programs
Exercise
Code Optimization
Profiling
Using better algorihtms and data structures
I/O Bound problems
Problem set for optimization
Using C code in Python
Example: The Fibonacci Sequence
Using clang and bitey
Using gcc and ctypes
Using Cython
Benchmark
Using functions from various compiled languages in Python
C
C++
Fortran
Benchmarking
Wrapping a function from a C library for use in Python
Wrapping functions from C++ library for use in Pyton
Julia and Python
Defining a function in Julia
Using it in Python
Using Python libraries in Julia
Converting Python Code to C for speed
Example: Fibonacci
Example: Matrix multiplication
Example: Pairwise distance matrix
Profiling code
Numba
Cython
Comparison with optimized C from scipy
Optimization bake-off
Python version
Numpy version
Numexpr version
Numba version
NumbaPro version
Parakeet version
Cython version
C version
C++ version
Fortran version
Bake-off
Summary
Recommendations for optimizing Python code
Writing Parallel Code
Concepts
Embarassingly parallel programs
Using Multiprocessing
Using IPython parallel for interactive parallel computing
Other parallel programming approaches not covered
References
Massively parallel programming with GPUs
Programming GPUs
GPU Architecture
CUDA Python
Getting Started with CUDA
Vector addition - the ‘Hello, world’ of CUDA
Performing a reduction on CUDA
Recreational
More examples
Writing CUDA in C
Review of GPU Architechture - A Simplification
Cuda C program - an Outline
Distributed computing for Big Data
Why and when does distributed computing matter?
Ingredients for effiicient distributed computing
What is Hadoop?
Review of functional programming
The Hadoop MapReduce workflow
Using Hadoop MapReduce
Spark
Hadoop MapReduce on AWS EMR with
mrjob
MapReduce code
Configuration file
Launching job
Spark on a local mahcine using 4 nodes
Using Spark in standalone prograsm
Introduction to Spark concepts with a data manipulation example
Using the MLlib for Regression
References
Modules and Packaging
Modules
Distributing your package
Tour of the Jupyter (IPython3) notebook
Installing Jupyter
Installing other kernels
Installing extensions
Installing Python3 while keeping Python2
Now, restart your notebook server
Polyglot programming
Python 2
Python 3
Bash
R
Scala
Julia
Processing
What you should know and learn more about
Statistical foundations
Computing foundations
Mathematical foundations
Statistical algorithms
Libraries worth knowing about after numpy, scipy and matplotlib
Wrapping R libraries with Rpy
Page
Introduction to Python
Variables
Operators
Iterators
Conditional Statements
Functions
Strings and String Handling
Lists, Tuples, Dictionaries
Classes
Modules
The standard library
Keeping the Anaconda distribution up-to-date
Exercises
Getting started with Python and the IPython notebook
Cells
Code Cells
Magic Commands
Python as Glue
Python <-> R <-> Matlab <-> Octave
More Glue: Julia and Perl
Functions are first class objects
Function argumnents
Call by “object reference”
Binding of default arguments occurs at function
definition
Higher-order functions
Anonymous functions
Pure functions
Recursion
Iterators
Generators
Generators and comprehensions
Utilites - enumerate, zip and the ternary if-else operator
Decorators
The
operator
module
The
functools
module
The
itertools
module
The
toolz
,
fn
and
funcy
modules
Exercises
Data science is OSEMN
Obtaining data
Scrubbing data
Exercises
Working with text
String methods
Splitting and joining strings
The string module
Regular expressions
The NLTK toolkit
Exercises
Preprocessing text data
Example: Counting words in a document
Working with structured data
Using SQLite3
Basic concepts of database normalization
Using HDF5
Interfacing withPandas
Using numpy
References
Example
NDArray
Broadcasting, row, column and matrix operations
Universal functions (Ufuncs)
Generalized ufucns
Random numbers
Linear algebra
Exercises
Using Pandas
Series
DataFrame
Panels
Split-Apply-Combine
Using statsmodels
Using R from IPython
Using Rmagic
Using R from pandas
Computational problems in statistics
Textbook example - is coin fair?
Bayesian approach
Comment
Computer numbers and mathematics
Some examples of numbers behaving badly
Finite representation of numbers
Using arbitrary precision libraries
From numbers to Functions: Stability and conditioning
Exercises
Algorithmic complexity
Profling and benchmarking
Measuring algorithmic complexity
Space complexity
Linear Algebra and Linear Systems
Simultaneous Equations
Linear Independence
Norms and Distance of Vectors
Trace and Determinant of Matrices
Column space, Row space, Rank and Kernel
Matrices as Linear Transformations
Matrix Norms
Special Matrices
Exercises
Linear Algebra and Matrix Decompositions
Large Linear Systems
Example: Netflix Competition (circa 2006-2009)
Matrix Decompositions
Matrix Decompositions for PCA and Least Squares
Singular Value Decomposition
Stabilty and Condition Number
Exercises
Change of Basis
Variance and covariance
Eigendecomposition of the covariance matrix
PCA
Change of basis via PCA
Graphical illustration of change of basis
Dimension reduction via PCA
Using Singular Value Decomposition (SVD) for PCA
Optimization and Non-linear Methods
Example: Maximum Likelihood Estimation (MLE)
Bisection Method
Secant Method
Newton-Rhapson Method
Gauss-Newton
Inverse Quadratic Interpolation
Brent’s Method
Practical Optimizatio Routines
Finding roots
Optimization Primer
Using
scipy.optimize
Gradient deescent
Newton’s method and variants
Constrained optimization
Curve fitting
Finding paraemeters for ODE models
Optimization of graph node placement
Optimization of standard statistical models
Fitting ODEs with the Levenberg–Marquardt algorithm
1D example
2D example
Algorithms for Optimization and Root Finding for Multivariate Problems
Optimizers
Solvers
GLM Estimation and IRLS
Expectation Maximizatio (EM) Algorithm
Jensen’s inequality
Maximum likelihood with complete information
Incomplete information
Gaussian mixture models
Using EM
Vectorized version
Vectorization with Einstein summation notation
Comparison of EM routines
Monte Carlo Methods
Pseudorandom number generators (PRNG)
Monte Carlo swindles (Variance reduction techniques)
Quasi-random numbers
Resampling methods
Resampling
Simulations
Setting the random seed
Sampling with and without replacement
Calculation of Cook’s distance
Permutation resampling
Design of simulation experiments
Example: Simulations to estimate power
Check with R
Estimating the CDF
Estimating the PDF
Kernel density estimation
Multivariate kerndel density estimation
Markov Chain Monte Carlo (MCMC)
Bayesian Data Analysis
Metropolis-Hastings sampler
Gibbs sampler
Slice sampler
Hierarchical models
Using PyMC2
Coin toss
Estimating mean and standard deviation of normal distribution
Estimating parameters of a linear regreession model
Estimating parameters of a logistic model
Using a hierarchcical model
Using PyMC3
Coin toss
Estimating mean and standard deviation of normal distribution
Estimating parameters of a linear regreession model
Estimating parameters of a logistic model
Using a hierarchcical model
Using PyStan
References
Simple Logistic model
Animations of Metropolis, Gibbs and Slice Sampler dynamics
C Crash Course
Hello world
A tutorial example - coding a Fibonacci function in C
Types in C
Operators
Control of program flow
Arrays and pointers
Functions
Function pointers
Using make to compile C programs
Exercise
Code Optimization
Profiling
Using better algorihtms and data structures
I/O Bound problems
Problem set for optimization
Using C code in Python
Example: The Fibonacci Sequence
Using clang and bitey
Using gcc and ctypes
Using Cython
Benchmark
Using functions from various compiled languages in Python
C
C++
Fortran
Benchmarking
Wrapping a function from a C library for use in Python
Wrapping functions from C++ library for use in Pyton
Julia and Python
Defining a function in Julia
Using it in Python
Using Python libraries in Julia
Converting Python Code to C for speed
Example: Fibonacci
Example: Matrix multiplication
Example: Pairwise distance matrix
Profiling code
Numba
Cython
Comparison with optimized C from scipy
Optimization bake-off
Python version
Numpy version
Numexpr version
Numba version
NumbaPro version
Parakeet version
Cython version
C version
C++ version
Fortran version
Bake-off
Summary
Recommendations for optimizing Python code
Writing Parallel Code
Concepts
Embarassingly parallel programs
Using Multiprocessing
Using IPython parallel for interactive parallel computing
Other parallel programming approaches not covered
References
Massively parallel programming with GPUs
Programming GPUs
GPU Architecture
CUDA Python
Getting Started with CUDA
Vector addition - the ‘Hello, world’ of CUDA
Performing a reduction on CUDA
Recreational
More examples
Writing CUDA in C
Review of GPU Architechture - A Simplification
Cuda C program - an Outline
Distributed computing for Big Data
Why and when does distributed computing matter?
Ingredients for effiicient distributed computing
What is Hadoop?
Review of functional programming
The Hadoop MapReduce workflow
Using Hadoop MapReduce
Spark
Hadoop MapReduce on AWS EMR with
mrjob
MapReduce code
Configuration file
Launching job
Spark on a local mahcine using 4 nodes
Using Spark in standalone prograsm
Introduction to Spark concepts with a data manipulation example
Using the MLlib for Regression
References
Modules and Packaging
Modules
Distributing your package
Tour of the Jupyter (IPython3) notebook
Installing Jupyter
Installing other kernels
Installing extensions
Installing Python3 while keeping Python2
Now, restart your notebook server
Polyglot programming
Python 2
Python 3
Bash
R
Scala
Julia
Processing
What you should know and learn more about
Statistical foundations
Computing foundations
Mathematical foundations
Statistical algorithms
Libraries worth knowing about after numpy, scipy and matplotlib
Wrapping R libraries with Rpy
Index