BIOS-823-2020 Statistical Programming for Biomedical Big Data¶
Contents:
- Python review of concepts
- Using numpy
- Data Science: Data processing
- Introduction to
pandas
- Series and Data Frames
- Creating Data Frames
- Indexing Data Frames
- Structure of a Data Frame
- Selecting, Renaming and Removing Columns
- Selecting, Renaming and Removing Rows
- Transforming and Creating Columns
- Sorting Data Frames
- Summarizing
- Split-Apply-Combine
- Combining Data Frames
- Fixing common DataFrame issues
- Reshaping Data Frames
- Pivoting
- Functional style -
apply
,applymap
andmap
- Chaining commands
- Moving between R and Python in Jupyter
- Introduction to
- Exploratory visualization in
pandas
- Graphics and Visualization in Python
- Saving and sharing data
- Normalization
- Final tables
- Denormalization
- Relational Databases Overview
- SQL Queries 01
- SQL Queries 02
- MongoDB
- Redis
- Graph concepts
- Graph Algorithms
- Machine Learning for Data Scientists
- ML model examples
- Dimension Reduction
- Unsupervised Learning
- Clustering
- Processing data
- Basic inspection
- Detailed inspection
- Create new features
- Drop features
- Inspect for missing data
- Fill in missing values for categorical values
- Tangent:
catboost
is nice - Category encoding
- Split data into train and test data sets
- Category encoding
- Impute missing numeric values
- Standardize data
- Save processed data for future use
- Data
- Classification
- Imbalanced data
- Simulate an imbalanced data set
- Collect more data
- Use evaluation metrics that are less sensitive to imbalance
- Over-sample the minority class
- Under-sample the majority class
- Combine over- and under-sampling
- Use class weights to adjust the loss function
- Use a classifier that is less sensitive to imbalance
- Hyperparameter tuning
- Interpretable ML
- Functional programming in Python (operator, functional, itertoools, toolz)
- Tensorflow
- Tensorflow
- Deep Learning Models
- Deep Learning Models
- Spark Low Level API
- Spark High-Level API
- Using Spark Efficiently
- Spark SQL
- Spark GraphFrames
- Spark MLLib
- Set up Spark and Spark SQL contexts
- Vectors
- Manual construction of an ML DaataFrame
- Using VectorAssembler
- Generating simple statistics
- Split data
- Encoding categorical features
- Scaling
- Dimension reduction
- Clustering
- Model evaluation
- Pipelines
- Hyper-parameter optimization
- Using spark with a non-MLLib classifier
- Spark Structured Streaming