S01 Relational Databases

For a simple tutorial on database design, see Introduction to Database Design

For a deep dive, see Database Design for Mere Mortals

0. Packages for working with relational databases in Python

1. Motivation

Why relational databases and SQL?

  • History of databases
  • ACID
  • Data integrity
  • Schema

2. RDBMS

  • Memory
  • Storage
  • Dictionary
  • Query language

3. Anatomy

Table (Relation)

Represents a subject or an event.

Column (Attribute)

Represents a single variable or feature.

Row (Tuple)

Represents an observation.

4. Concepts

Constraints

You can impose constraints that values in a column have to take. For example, you can specify that values are compulsory (NOT NULL), or UNIQUE or fall within a certain range.

Referential integrity

  • Primary key represents a unique identifier of a row. It may be simple or composite.
  • Unique
  • Non-null
  • Never optional
  • Foreign key is a column containing the primary key of a different table. It enforces referential integrity.

Relationships

  • One to one
  • One to many
  • Many to many
  • What happens on delete?
  • Restrict
  • Cascade

Indexes

An index is a data structure that allows fast search of a column (typically from linear to log time complexity). Most databases will automatically build an index for every primary key column, but you can also manually specify columns to build indexes for.

Views

  • Temporary virtual table retuned as a result of a query.
  • Views only specify the strucutre of a table - the contents are constructed on the fly from existing tables.
  • Queries return a Result Set

5. Design

Columns

  • Use singlular form for name
  • Use informative names
  • Use unique names not shared by any other table (except foreign keys)
  • Column must be an attribute of the table’s subject
  • Eliminate multi-part columns
  • Eliminate multi-value columsn
  • Eliminate redundant columns

Tables

  • Use singular/plural forms for name (controversial)
  • Enusre every table has a primary key
  • Eliminate duplicate columns

Relationships

  • Establish participation type and degree of relationship
  • One to one
  • One to many
  • Many to many

6. Example

Use sqlmagic as alternative to using sqlite3 driver.

In [1]:
%load_ext sql

Connect to SQLite3 database on disk (creates it if it does not exist)

In [2]:
%sql sqlite:///data/dummy.db
Out[2]:
'Connected: None@data/dummy.db'

SQL for table deletion and creation

In [3]:
%%sql

DROP TABLE IF EXISTS Country;
DROP TABLE IF EXISTS Person;

CREATE TABLE Country (
    country_id varcarh(2) PRIMARY KEY,
    country_name varchar(255)
);

CREATE TABLE Person (
    person_id INTEGER PRIMARY KEY,
    person_first varchar(255),
    person_last varchar(255),
    country_id INTEGER NOT NULL,
      FOREIGN KEY (country_id) REFERENCES Country(country_id)
);
Done.
Done.
Done.
Done.
Out[3]:
[]

SQL to insert rows.

In [4]:
%%sql

INSERT INTO Country(country_id, country_name)
VALUES ('FR', 'France'), ('CU', 'CUBA');
2 rows affected.
Out[4]:
[]
In [5]:
%%sql

INSERT INTO Person(person_first, person_last, country_id)
VALUES
('Napolean', 'Bonaparte', 'FR'),
('Luis','Alvarez', 'CU');
2 rows affected.
Out[5]:
[]

Accessing the RDBMS dictionary.

In [6]:
%%sql

SELECT name FROM sqlite_master
WHERE type = "table";
Done.
Out[6]:
name
Country
Person
In [7]:
%%sql

SELECT sql FROM sqlite_master
WHERE name='Person';
Done.
Out[7]:
sql
CREATE TABLE Person (
person_id INTEGER PRIMARY KEY,
person_first varchar(255),
person_last varchar(255),
country_id INTEGER NOT NULL,
FOREIGN KEY (country_id) REFERENCES Country(country_id)
)

SQL as a Query Language.

In [8]:
%%sql

SELECT person_first as first, person_last AS last, country_name AS nationality
FROM Person
INNER JOIN country
ON Person.country_id = Country.country_id;
Done.
Out[8]:
first last nationality
Napolean Bonaparte France
Luis Alvarez CUBA

Visualizing the entitry-relationship diagram (ERd).

In [10]:
import os
from eralchemy import render_er

if not os.path.exists('erd_from_sqlalchemy.png'):
    render_er('sqlite:///data/dummy.db', 'erd_from_sqlalchemy.png')

Homework walk-through

Convert the flat file data in data/flat.csv into a well-structured relational database in SQLite3 stored as data/faculty.db. Note - salary information is confidential and should be kept in a separate table from other personal data.

In [11]:
import pandas as pd
In [12]:
flat = pd.read_csv('data/flat.csv', keep_default_na=False)
flat.sample(10)
Out[12]:
name gender age height weight salary nationality code country language1 language2 language3 first last
7 Adan Brown Male 49 1.81 68 63000 Belgian BE Belgium F# Adan Brown
859 Lester Monroe Male 18 1.81 58 43000 Australian AU Australia Lua Go Lester Monroe
28 Aleshia Reilly Female 61 1.76 86 97000 Swiss CH Switzerland Aleshia Reilly
743 Kathaleen Faulkner Female 52 1.97 73 56000 Japanese JP Japan PowerShell Kathaleen Faulkner
255 Collin Kane Male 47 1.87 65 84000 Venezuelan VE Venezuela Collin Kane
800 Ladawn Koch Female 24 1.83 50 64000 German DE Germany AutoIt J# TypeScript Ladawn Koch
122 Bari Colon Female 65 1.60 62 116000 Ukrainian UA Ukraine PowerShell Tcl Bari Colon
253 Coletta Vega Female 57 1.87 44 94000 Brazilian BR Brazil C++ Coletta Vega
1343 Stephan Mcfadden Male 40 1.74 54 149000 Cuban CU Cuba Awk Stephan Mcfadden
551 Gonzalo House Male 42 1.81 44 114000 Danish DK Denmark AutoIt Perl Gonzalo House