STA-663-2017¶
Contents:
- Notes on using Jupyter
- Introduction to Python
- Functions
- Classes
- Strings
- Get “Through the Looking Glass”
- Slice to get Jabberwocky
- Find palindromic words in poem if any
- Top 10 most frequent words
- Words that appear exactly twice.
- Trigrams
- Find words in poem that are over-represented
- Encode and decode poem using a Caesar cipher
- Using Regular Expressions
- Natural language processing
- String formatting
- Using
numpy
- Resources
- Array creation
- Array manipulation
- Array indexing
- Calculations and broadcasting
- Combining and splitting arrays
- Reductions
- Example: Calculating pairwise distance matrix using broadcasting and vectorization
- Example: Consructing leave-one-out arrays
- Generalized ufuncs
- Saving and loading NDArrays
- Version information
- Graphics in Python
- Data
- SQL
- Machine Learning with
sklearn
- Code Optimization
- Just-in-time compilation (JIT)
- Cython
- Parallel Programming
- Multi-Core Parallelism
- Vanilla Python
- Using
numba
to speed up computation - Using
cython
to speed up computation - The
concurrent.futures
module - Using processes in parallel with
ProcessPoolExecutor
- Using processes in parallel with ThreadPoolExecutor
- Turning off the GIL in
cython
- Using processes in parallel with
ThreadPoolExecutor
andnogil
- Using
multiprocessing
- Common issues with use of shared memory in parallel programs
- Using
ipyparallel
- Using C++
- Using
pybind11
- Linear Algebra Review
- Linear Algebra and Linear Systems
- Matrix Decompositions
- Linear Algebra Examples
- Applications of Linear Alebra: PCA
- Sparse Matrices
- Optimization and Root Finding
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Using optimization routines from
scipy
andstatsmodels
- Random numbers and probability models
- Resampling and Monte Carlo Simulations
- Numerical Evaluation of Integrals
- Probabilistic Graphical Models with
pgmpy
- Working with large data sets
- Biggish Data
- Efficient storage of data in memory
- Working with large data sets
- Using Spark
- Using Spark Efficiently
- Spark MLLib
- Spark SQL
- Spark Streaming
- Spark on Cloud
- Azure
- AWS
- Know your AWS public and private access keys
- Know your AWS EC2 key-pair
- Install AWS command line client
- Configure the AWS command line client
- Create a cluster
- Get information about the cluster
- Connect to the cluster via
ssh
- Note the IP address that is returned
- Run
pyspark
- Run the
Zepellin
notebook - Connect to
Zeppelin
notebook - Create notebook and run Spark within it
- Terminate the cluster
- Using PyMC3
- PyStan
- Metropolis and Gibbs Sampling
- Using Auxiliary Variables in MCMC proposals
- TensorFlow and Edward
- Bonus Material: The Humble For Loop
- Bonus Material: Word count
- Symbolic Algebra with
sympy