{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Assignment 2: Data formats\n", "\n", "This is a review of basic Python from BIOS 821 as well as practice manipulating data formats common in medical data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Text**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**1**. (10 points)\n", "\n", "Read the text file `data/s01/alice.txt` and count the number of occurrences of 'Alice' in the text." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**JSON**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**2**. (20 points)\n", "\n", "Use `curl` to download the file at `https://swapi.co/api/people/1/` as `luke.json` in `data/s01`. Find the body mass index (BMI) of Luke Skywalker, rounded to 1 decimal place, and print the BMI category for Luke.\n", "\n", "BMI Categories: \n", "\n", "- Underweight = <18.5\n", "- Normal weight = 18.5–24.9 \n", "- Overweight = 25–29.9 \n", "- Obesity = BMI of 30 or greater\n", "\n", "Note: Use the `json` package." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**XML**\n", "\n", "**3**. (20 points)\n", "\n", "Read the XML file in `data/s01/patient.xml` and find all the unique FHIR tags used. FHIR tags start with `{http://hl7.org/fhir}`.\n", "\n", "Note: Use the `xml.etree` package." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Time series (structured data)**\n", "\n", "**4**. (20 points)\n", "\n", "Read the worksheet `Tourist arrivals` in the file `data/s01/touristexp.xls` into a `pandas` data frame. Drop any rows with missing values. Show a table of arrivals to `United States` where the rows are the `Region or origin` and the columns are years." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Image**\n", "\n", "**5**. (20 points)\n", "\n", "Use the `imread` function to read in the JPG image `data/s01/pony.jp` as a `numpy` array. What are the dimensions of the array? Display the image using `matplotlib`. Set all values in the `red` channel to 0. Redisplay the image. Make the region in the rectable with width between 300 and 400 pixels and height between 200 and 300 pixels black. Redisplay the image. \n", "\n", "Note: In NumPy indexing, the first dimension corresponds to rows, while the second corresponds to columns, with the origin on the top-left corner. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Genomics data**\n", "\n", "**6**. (10 points)\n", "\n", "Join the first 10 lines containing sequence data of the E. Coli genome found in FASTA file `data/s01/ecoli.fna` into a single string. Note that header lines start with '>' in the FASTA format. Print the reverse complement of the joined sequence in lines of length 80. \n", "\n", "Use `textwrap` to format the fixed width output." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }