{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# S01 Relational Databases\n", "\n", "For a simple tutorial on database design, see [Introduction to Database Design](https://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html)\n", "\n", "For a deep dive, see [Database Design for Mere Mortals](https://www.amazon.com/Database-Design-Mere-Mortals-Hands/dp/0321884493/ref=dp_ob_title_bk)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Packages for working with relational databases in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Python Database API Specification v2.0](https://www.python.org/dev/peps/pep-0249/) - The standard Python Database API\n", "- [sqlite3](https://docs.python.org/3.7/library/sqlite3.html) - API for builit-in `sqlite3` package\n", "- [Database drivers](https://github.com/vinta/awesome-python#database-drivers) - For connecting to other databases\n", "- [ipython-sql](https://github.com/catherinedevlin/ipython-sql) - SQL magic in Jupyter\n", "- [SQLAlchemy](https://www.sqlalchemy.org) - Most well-known Object Relational Mapper (ORM)\n", "- [Pony ORM](https://ponyorm.com) - Alternative ORM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Motivation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why relational databases and SQL?\n", "\n", "- History of databases\n", "- ACID\n", "- Data integrity\n", "- Schema" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. RDBMS\n", "\n", "- Memory\n", "- Storage\n", "- Dictionary\n", "- Query language" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Anatomy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Table (Relation)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Represents a *subject* or an *event*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Column (Attribute)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Represents a single *variable* or *feature*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Row (Tuple)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Represents an *observation*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Concepts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Constraints\n", "\n", "You can impose constraints that values in a column have to take. For example, you can specify that values are compulsory (NOT NULL), or UNIQUE or fall within a certain range." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Referential integrity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Primary key represents a unique identifier of a row. It may be simple or composite.\n", " - Unique\n", " - Non-null\n", " - Never optional\n", "- Foreign key is a column containing the primary key of a different table. It enforces *referential integrity*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Relationships" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- One to one\n", "- One to many\n", "- Many to many\n", "\n", "- What happens on delete?\n", " - Restrict\n", " - Cascade" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexes\n", "\n", "An index is a data structure that allows fast search of a column (typically from linear to log time complexity). Most databases will automatically build an index for every primary key column, but you can also manually specify columns to build indexes for. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Views" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Temporary virtual table retuned as a result of a *query*.\n", "- Views only specify the strucutre of a table - the contents are constructed on the fly from existing tables.\n", "- Queries return a Result Set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Design" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Use singlular form for name \n", "- Use informative names\n", "- Use unique names not shared by any other table (except foreign keys)\n", "- Column must be an attribute of the table's subject\n", "- Eliminate multi-part columns\n", "- Eliminate multi-value columsn\n", "- Eliminate redundant columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Use singular/plural forms for name (controversial)\n", "- Enusre every table has a primary key\n", "- Eliminate duplicate columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Relationships" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Establish participation type and degree of relationship\n", " - One to one\n", " - One to many\n", " - Many to many" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `sqlmagic` as alternative to using `sqlite3` driver." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext sql" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Connect to SQLite3 database on disk (creates it if it does not exist)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Connected: None@data/dummy.db'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql sqlite:///data/dummy.db" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SQL for table deletion and creation" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "Done.\n", "Done.\n", "Done.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "\n", "DROP TABLE IF EXISTS Country;\n", "DROP TABLE IF EXISTS Person;\n", "\n", "CREATE TABLE Country (\n", " country_id varcarh(2) PRIMARY KEY,\n", " country_name varchar(255)\n", ");\n", "\n", "CREATE TABLE Person (\n", " person_id INTEGER PRIMARY KEY,\n", " person_first varchar(255),\n", " person_last varchar(255),\n", " country_id INTEGER NOT NULL,\n", " FOREIGN KEY (country_id) REFERENCES Country(country_id)\n", ");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SQL to insert rows." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2 rows affected.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "\n", "INSERT INTO Country(country_id, country_name) \n", "VALUES ('FR', 'France'), ('CU', 'CUBA');" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2 rows affected.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "\n", "INSERT INTO Person(person_first, person_last, country_id) \n", "VALUES \n", "('Napolean', 'Bonaparte', 'FR'),\n", "('Luis','Alvarez', 'CU');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Accessing the RDBMS dictionary." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "
name | \n", "
---|
Country | \n", "
Person | \n", "
sql | \n", "
---|
CREATE TABLE Person ( person_id INTEGER PRIMARY KEY, person_first varchar(255), person_last varchar(255), country_id INTEGER NOT NULL, FOREIGN KEY (country_id) REFERENCES Country(country_id) ) | \n",
"
first | \n", "last | \n", "nationality | \n", "
---|---|---|
Napolean | \n", "Bonaparte | \n", "France | \n", "
Luis | \n", "Alvarez | \n", "CUBA | \n", "
\n", " | name | \n", "gender | \n", "age | \n", "height | \n", "weight | \n", "salary | \n", "nationality | \n", "code | \n", "country | \n", "language1 | \n", "language2 | \n", "language3 | \n", "first | \n", "last | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | \n", "Adan Brown | \n", "Male | \n", "49 | \n", "1.81 | \n", "68 | \n", "63000 | \n", "Belgian | \n", "BE | \n", "Belgium | \n", "F# | \n", "\n", " | \n", " | Adan | \n", "Brown | \n", "
859 | \n", "Lester Monroe | \n", "Male | \n", "18 | \n", "1.81 | \n", "58 | \n", "43000 | \n", "Australian | \n", "AU | \n", "Australia | \n", "Lua | \n", "Go | \n", "\n", " | Lester | \n", "Monroe | \n", "
28 | \n", "Aleshia Reilly | \n", "Female | \n", "61 | \n", "1.76 | \n", "86 | \n", "97000 | \n", "Swiss | \n", "CH | \n", "Switzerland | \n", "\n", " | \n", " | \n", " | Aleshia | \n", "Reilly | \n", "
743 | \n", "Kathaleen Faulkner | \n", "Female | \n", "52 | \n", "1.97 | \n", "73 | \n", "56000 | \n", "Japanese | \n", "JP | \n", "Japan | \n", "PowerShell | \n", "\n", " | \n", " | Kathaleen | \n", "Faulkner | \n", "
255 | \n", "Collin Kane | \n", "Male | \n", "47 | \n", "1.87 | \n", "65 | \n", "84000 | \n", "Venezuelan | \n", "VE | \n", "Venezuela | \n", "\n", " | \n", " | \n", " | Collin | \n", "Kane | \n", "
800 | \n", "Ladawn Koch | \n", "Female | \n", "24 | \n", "1.83 | \n", "50 | \n", "64000 | \n", "German | \n", "DE | \n", "Germany | \n", "AutoIt | \n", "J# | \n", "TypeScript | \n", "Ladawn | \n", "Koch | \n", "
122 | \n", "Bari Colon | \n", "Female | \n", "65 | \n", "1.60 | \n", "62 | \n", "116000 | \n", "Ukrainian | \n", "UA | \n", "Ukraine | \n", "PowerShell | \n", "Tcl | \n", "\n", " | Bari | \n", "Colon | \n", "
253 | \n", "Coletta Vega | \n", "Female | \n", "57 | \n", "1.87 | \n", "44 | \n", "94000 | \n", "Brazilian | \n", "BR | \n", "Brazil | \n", "C++ | \n", "\n", " | \n", " | Coletta | \n", "Vega | \n", "
1343 | \n", "Stephan Mcfadden | \n", "Male | \n", "40 | \n", "1.74 | \n", "54 | \n", "149000 | \n", "Cuban | \n", "CU | \n", "Cuba | \n", "Awk | \n", "\n", " | \n", " | Stephan | \n", "Mcfadden | \n", "
551 | \n", "Gonzalo House | \n", "Male | \n", "42 | \n", "1.81 | \n", "44 | \n", "114000 | \n", "Danish | \n", "DK | \n", "Denmark | \n", "AutoIt | \n", "Perl | \n", "\n", " | Gonzalo | \n", "House | \n", "