S01 Relational Databases¶
For a simple tutorial on database design, see Introduction to Database Design
For a deep dive, see Database Design for Mere Mortals
0. Packages for working with relational databases in Python¶
- Python Database API Specification v2.0 - The standard Python Database API
- sqlite3 - API
for builit-in
sqlite3
package - Database drivers - For connecting to other databases
- ipython-sql - SQL magic in Jupyter
- SQLAlchemy - Most well-known Object Relational Mapper (ORM)
- Pony ORM - Alternative ORM
2. RDBMS¶
- Memory
- Storage
- Dictionary
- Query language
3. Anatomy¶
Table (Relation)¶
Represents a subject or an event.
Column (Attribute)¶
Represents a single variable or feature.
Row (Tuple)¶
Represents an observation.
4. Concepts¶
Constraints¶
You can impose constraints that values in a column have to take. For example, you can specify that values are compulsory (NOT NULL), or UNIQUE or fall within a certain range.
Referential integrity¶
- Primary key represents a unique identifier of a row. It may be simple or composite.
- Unique
- Non-null
- Never optional
- Foreign key is a column containing the primary key of a different table. It enforces referential integrity.
Relationships¶
- One to one
- One to many
- Many to many
- What happens on delete?
- Restrict
- Cascade
Indexes¶
An index is a data structure that allows fast search of a column (typically from linear to log time complexity). Most databases will automatically build an index for every primary key column, but you can also manually specify columns to build indexes for.
Views¶
- Temporary virtual table retuned as a result of a query.
- Views only specify the strucutre of a table - the contents are constructed on the fly from existing tables.
- Queries return a Result Set
5. Design¶
Columns¶
- Use singlular form for name
- Use informative names
- Use unique names not shared by any other table (except foreign keys)
- Column must be an attribute of the table’s subject
- Eliminate multi-part columns
- Eliminate multi-value columsn
- Eliminate redundant columns
Tables¶
- Use singular/plural forms for name (controversial)
- Enusre every table has a primary key
- Eliminate duplicate columns
Relationships¶
- Establish participation type and degree of relationship
- One to one
- One to many
- Many to many
6. Example¶
Use sqlmagic
as alternative to using sqlite3
driver.
In [1]:
%load_ext sql
Connect to SQLite3 database on disk (creates it if it does not exist)
In [2]:
%sql sqlite:///data/dummy.db
Out[2]:
'Connected: None@data/dummy.db'
SQL for table deletion and creation
In [3]:
%%sql
DROP TABLE IF EXISTS Country;
DROP TABLE IF EXISTS Person;
CREATE TABLE Country (
country_id varcarh(2) PRIMARY KEY,
country_name varchar(255)
);
CREATE TABLE Person (
person_id INTEGER PRIMARY KEY,
person_first varchar(255),
person_last varchar(255),
country_id INTEGER NOT NULL,
FOREIGN KEY (country_id) REFERENCES Country(country_id)
);
Done.
Done.
Done.
Done.
Out[3]:
[]
SQL to insert rows.
In [4]:
%%sql
INSERT INTO Country(country_id, country_name)
VALUES ('FR', 'France'), ('CU', 'CUBA');
2 rows affected.
Out[4]:
[]
In [5]:
%%sql
INSERT INTO Person(person_first, person_last, country_id)
VALUES
('Napolean', 'Bonaparte', 'FR'),
('Luis','Alvarez', 'CU');
2 rows affected.
Out[5]:
[]
Accessing the RDBMS dictionary.
In [6]:
%%sql
SELECT name FROM sqlite_master
WHERE type = "table";
Done.
Out[6]:
name |
---|
Country |
Person |
In [7]:
%%sql
SELECT sql FROM sqlite_master
WHERE name='Person';
Done.
Out[7]:
sql |
---|
CREATE TABLE Person ( person_id INTEGER PRIMARY KEY, person_first varchar(255), person_last varchar(255), country_id INTEGER NOT NULL, FOREIGN KEY (country_id) REFERENCES Country(country_id) ) |
SQL as a Query Language.
In [8]:
%%sql
SELECT person_first as first, person_last AS last, country_name AS nationality
FROM Person
INNER JOIN country
ON Person.country_id = Country.country_id;
Done.
Out[8]:
first | last | nationality |
---|---|---|
Napolean | Bonaparte | France |
Luis | Alvarez | CUBA |
Visualizing the entitry-relationship diagram (ERd).
In [10]:
import os
from eralchemy import render_er
if not os.path.exists('erd_from_sqlalchemy.png'):
render_er('sqlite:///data/dummy.db', 'erd_from_sqlalchemy.png')

Homework walk-through¶
Convert the flat file data in data/flat.csv
into a well-structured
relational database in SQLite3 stored as data/faculty.db
. Note -
salary information is confidential and should be kept in a separate
table from other personal data.
In [11]:
import pandas as pd
In [12]:
flat = pd.read_csv('data/flat.csv', keep_default_na=False)
flat.sample(10)
Out[12]:
name | gender | age | height | weight | salary | nationality | code | country | language1 | language2 | language3 | first | last | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | Adan Brown | Male | 49 | 1.81 | 68 | 63000 | Belgian | BE | Belgium | F# | Adan | Brown | ||
859 | Lester Monroe | Male | 18 | 1.81 | 58 | 43000 | Australian | AU | Australia | Lua | Go | Lester | Monroe | |
28 | Aleshia Reilly | Female | 61 | 1.76 | 86 | 97000 | Swiss | CH | Switzerland | Aleshia | Reilly | |||
743 | Kathaleen Faulkner | Female | 52 | 1.97 | 73 | 56000 | Japanese | JP | Japan | PowerShell | Kathaleen | Faulkner | ||
255 | Collin Kane | Male | 47 | 1.87 | 65 | 84000 | Venezuelan | VE | Venezuela | Collin | Kane | |||
800 | Ladawn Koch | Female | 24 | 1.83 | 50 | 64000 | German | DE | Germany | AutoIt | J# | TypeScript | Ladawn | Koch |
122 | Bari Colon | Female | 65 | 1.60 | 62 | 116000 | Ukrainian | UA | Ukraine | PowerShell | Tcl | Bari | Colon | |
253 | Coletta Vega | Female | 57 | 1.87 | 44 | 94000 | Brazilian | BR | Brazil | C++ | Coletta | Vega | ||
1343 | Stephan Mcfadden | Male | 40 | 1.74 | 54 | 149000 | Cuban | CU | Cuba | Awk | Stephan | Mcfadden | ||
551 | Gonzalo House | Male | 42 | 1.81 | 44 | 114000 | Danish | DK | Denmark | AutoIt | Perl | Gonzalo | House |