{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# STA 663 Midterm Exams\n",
    "\n",
    "Please observe the Duke honor code for this **closed book** exam.\n",
    "\n",
    "**Permitted exceptions to the closed book rule**\n",
    "\n",
    "- You may use any of the links accessible from the Help Menu for reference - that is, you may follow a chain of clicks from the landing pages of the sites accessible through the Help Menu. If you find yourself outside the help/reference pages of `python`, `ipython`, `numpy`, `scipy`, `matplotlib`, `sympy`, `pandas`, (e.g. on a Google search page or stackoverflow or current/past versions of the STA 663 notes) you are in danger of violating the honor code and should exit immediately.\n",
    "\n",
    "- You may also use TAB or SHIFT-TAB completion, as well as `?foo`, `foo?` and `help(foo)` for any function, method or class `foo`.\n",
    "\n",
    "The total points allocated is 125, but the maximum possible is 100. Hence it is possible to score 100 even with some errors or incomplete solutions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Imports**\n",
    "\n",
    "All the necessary packages have been imported for you in the Code cells below. You should not need any additional imports."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import  matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import scipy.linalg as la\n",
    "from collections import Counter\n",
    "from functools import reduce"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%load_ext rpy2.ipython"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1**. (10 points)\n",
    "\n",
    "Read the flights data at https://raw.githubusercontent.com/mwaskom/seaborn-data/master/flights.csv into a `pnadas` data frame. Find the average number of passengers per quarter (Q1, Q2, Q3,Q4) across the years 1950-1959 (inclusive of 1950 and 1959), where\n",
    "\n",
    "- Q1 = Jan, Feb, Mar\n",
    "- Q2 = Apr, May, Jun\n",
    "- Q3 = Jul, Aug, Sep\n",
    "- Q4 = Oct, Nov, Dec"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2**. (10 points)\n",
    "\n",
    "The Collatz sequence is defined by the following rules for finding the next number\n",
    "\n",
    "```\n",
    "if the current number is even, divide by 2\n",
    "if the current number is odd, multiply by 3 and add 1\n",
    "if the current number is 1, stop\n",
    "```\n",
    "\n",
    "- Find the starting integer that gives the longest Collatz sequence for integers in the range(1, 10000). What is the starting number and length of this Collatz sequence?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3**. (10 points)\n",
    "\n",
    "Recall that a covariance matrix is a matrix whose entries are\n",
    "\n",
    "![img](https://wikimedia.org/api/rest_v1/media/math/render/svg/4df2969e65403dd04f2c64137d21ff59b5f54190)\n",
    "\n",
    "Find the sample covariance matrix of the 4 features of the **iris** data set at http://bit.ly/2ow0oJO using basic `numpy` operations on `ndarrasy`. Do **not** use the `np.cov` or equivalent functions in `pandas` (except for checking). Remember to scale by $1/(n-1)$ for the sample covariance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**4**. (10 points)\n",
    "\n",
    "How many numbers in `range(100, 1000)` are divisible by 17 after you square them and add 1? Find this out using only **lambda** functions, **map**, **filter** and **reduce** on `xs`, where `xs = range(100, 10000)`.\n",
    "\n",
    "In pseudo-code, you want to achieve\n",
    "\n",
    "```python\n",
    "xs = range(100, 10000)\n",
    "count(y for y in (x**2 + 1 for x in xs) if y % 17 == 0)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**5**. (20 points)\n",
    "\n",
    "- Given the DNA sequence below, create a $4 \\times 4$ transition matrix $A$ where $A[i,j]$ is the probability of the base $j$ appearing immediately after base $i$. Note that a *base* is one of the four letters `a`, `c`, `t` or `g`. The letters below should be treated as a single sequence, broken into separate lines just for formatting purposes. You should check that row probabilities sum to 1. (10 points)\n",
    "- Find the steady state distribution of the 4 bases from the row stochastic transition matrix - that is the, the values of $x$ for which $x^TA = x$. Find the solution by solving a set of linear equations. Hint: you need to add a constraint on the values of $x$. Only partial credit will be given for other methods of finding the steady state distribution. (10 points)\n",
    "\n",
    "```\n",
    "gggttgtatgtcacttgagcctgtgcggacgagtgacacttgggacgtgaacagcggcggccgatacgttctctaagatc\n",
    "ctctcccatgggcctggtctgtatggctttcttgttgtgggggcggagaggcagcgagtgggtgtacattaagcatggcc\n",
    "accaccatgtggagcgtggcgtggtcgcggagttggcagggtttttgggggtggggagccggttcaggtattccctccgc\n",
    "gtttctgtcgggtaggggggcttctcgtaagggattgctgcggccgggttctctgggccgtgatgactgcaggtgccatg\n",
    "gaggcggtttggggggcccccggaagtctagcgggatcgggcttcgtttgtggaggagggggcgagtgcggaggtgttct\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Part 1**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Part 2**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**6**. (10 points)\n",
    "\n",
    "- Find the matrix $A$ that results in rotating the standard vectors in $\\mathbb{R}^2$ by 30 degrees counter-clockwise and stretches $e_1$ by a factor of 3 and contracts $e_2$ by a factor of $0.5$. \n",
    "- What is the inverse of this matrix? How you find the inverse should reflect your understanding.\n",
    "\n",
    "The effects of the matrix $A$ and $A^{-1}$ are shown in the figure below:\n",
    "\n",
    "![image](./vecs.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**7**. (55 points) \n",
    "\n",
    "We observe some data points $(x_i, y_i)$, and believe that an appropriate model for the data is that\n",
    "\n",
    "$$\n",
    "f(x) = ax^2 + bx^3 + c\\sin{x}\n",
    "$$\n",
    "\n",
    "with some added noise. Find optimal values of the parameters $\\beta = (a, b, c)$ that minimize $\\Vert y - f(x) \\Vert^2$\n",
    "\n",
    "1. using `scipy.linalg.lstsq` (10 points)\n",
    "2. solving the normal equations $X^TX \\beta = X^Ty$ (10 points)\n",
    "3. using `scipy.linalg.svd` (10 points)\n",
    "4. using gradient descent with RMSProp (no bias correction) and starting with an initial value of $\\beta = \\begin{bmatrix}1 & 1 & 1\\end{bmatrix}$. Use a learning rate of 0.01 and 10,000 iterations, and set the $\\beta$ parameter of RMSprop to be 0.9 (this is a different $\\beta$ from the parameters of the function we are minimizing). Running gradient descent should take a few seconds to complete. (25 points)\n",
    "\n",
    "In each case, plot the data and fitted curve using `matplotlib`.\n",
    "\n",
    "Data\n",
    "```\n",
    "x = array([ 3.4027718 ,  4.29209002,  5.88176277,  6.3465969 ,  7.21397852,\n",
    "        8.26972154, 10.27244608, 10.44703778, 10.79203455, 14.71146298])\n",
    "y = array([ 25.54026428,  29.4558919 ,  58.50315846,  70.24957254,\n",
    "        90.55155435, 100.56372833,  91.83189927,  90.41536733,\n",
    "        90.43103028,  23.0719842 ])\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Using `lstsq`**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Using normal equations**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "**Using SVD**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Using Gradient descent with RMSprop**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}