These pages are no longer maintained.
Please use current verison.
Computational Statistics in Python¶
Notebooks for each topic are in the GitHub repository
Topics¶
- Introduction to Python
- Functions
- Strings
- Get “Through the Looking Glass”
- Slice to get Jabberwocky
- Find palindromic words in poem if any
- Top 10 most frequent words
- Words that appear exactly twice.
- Trigrams
- Find words in poem that are over-represented
- Encode and decode poem using a Caesar cipher
- Using Regular Expressions
- Natural language processing
- String formatting
- I/O
- Classes
- Using
numpy
- Resources
- Array creation
- Array manipulaiton
- Array indexing
- Calculations and broadcasting
- Combining and splitting arrays
- Reductions
- Example: Calculating pairwise distance matrix using broadcasting and vectorization
- Example: Consructing leave-one-out arrays
- Generalized ufucns
- Saving and loading NDArrays
- Version information
- Graphics in Python
- Data
- SQL
- Machine Learning with
sklearn
- Linear Algebra Review
- Linear Algebra and Linear Systems
- Matrix Decompositions
- Linear Algebra Examples
- Applications of Linear Alebra: PCA
- Symbolic Algebra with
sympy
- Optimization and Root Finding
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Using optimization routines from
scipy
andstatsmodels
- Expecation Maximization
- Random Numbers
- Resampling and Monte Carlo Simulations
- Numerical Evaluation of Integrals
- Metropolis and Gibbs Sampling
- Using Auxiliary Variables in MCMC proposals
- Using PyMC3
- PyStan
- Introduction to C
- Introduction to C++
- Review and Quiz on C and C++
- Code Optimization
- Foreign Function Interface
- Just-in-time compilation (JIT)
- Cython
- Making Python faster
- Comparsion of optimizaiton approaches
- Using
pybind11
- Parallel Programming
- Multi-Core Parallelism
- Vanilla Python
- Using
numba
to speed up computation - Using
cython
to speed up computation - The
concurrent.futures
module - Using processes in parallel with
ProcessPoolExecutor
- Using processes in parallel with ThreadPoolExecutor
- Turning off the GIL in
cython
- Using processes in parallel with
ThreadPoolExecutor
andnogil
- Using
multiprocessing
- Common issues with use of shared memory in parallel programs
- Using
ipyparallel
- Biggish Data
- Efficient storage of data in memeory
- Introduction to Spark
- Using Spark Efficiently
- Spark SQL
- Spark MLLib
- Spark on Cloud
- Azure
- AWS
- Know your AWS public and private access keys
- Know your AWS EC2 key-pair
- Install AWS command line client
- Configure the AWS command line client
- Create a cluster
- Get information about the cluster
- Connect to the cluster via
ssh
- Note the IP address that is returned
- Run
pyspark
- Run the
Zepellin
notebook - Connect to
Zeppelin
notebook - Create notebook and run Spark within it
- Terminate the cluster
- Further Reading