SQL Queries 01¶
For more SQL examples in the SQLite3 dialect, seee SQLite3 tutorial.
For a deep dive, see SQL Queries for Mere Mortals.
Data¶
[1]:
%load_ext sql
[2]:
%sql sqlite:///data/faculty.db
[3]:
%%sql
SELECT * FROM sqlite_master WHERE type='table';
* sqlite:///data/faculty.db
Done.
[3]:
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | person | person | 2 | CREATE TABLE person ( "index" BIGINT, person_id BIGINT, first TEXT, last TEXT, age BIGINT, height FLOAT, weight BIGINT, country_id TEXT, gender_id BIGINT ) |
table | confidential | confidential | 18 | CREATE TABLE confidential ( "index" BIGINT, person_id BIGINT, salary BIGINT ) |
table | person_language | person_language | 33 | CREATE TABLE person_language ( "index" BIGINT, person_id BIGINT, language_id BIGINT ) |
table | language | language | 50 | CREATE TABLE language ( "index" BIGINT, language_id BIGINT, language_name TEXT ) |
table | gender | gender | 55 | CREATE TABLE gender ( "index" BIGINT, gender_id BIGINT, gender TEXT ) |
table | country | country | 57 | CREATE TABLE country ( "index" BIGINT, country_id TEXT, country TEXT, nationality TEXT ) |
Note: You can save results as a variable
[4]:
%%sql master <<
SELECT * FROM sqlite_master WHERE type='table'
* sqlite:///data/faculty.db
Done.
Returning data to local variable master
[5]:
master.DataFrame()
[5]:
type | name | tbl_name | rootpage | sql | |
---|---|---|---|---|---|
0 | table | person | person | 2 | CREATE TABLE person (\n\t"index" BIGINT, \n\tp... |
1 | table | confidential | confidential | 18 | CREATE TABLE confidential (\n\t"index" BIGINT,... |
2 | table | person_language | person_language | 33 | CREATE TABLE person_language (\n\t"index" BIGI... |
3 | table | language | language | 50 | CREATE TABLE language (\n\t"index" BIGINT, \n\... |
4 | table | gender | gender | 55 | CREATE TABLE gender (\n\t"index" BIGINT, \n\tg... |
5 | table | country | country | 57 | CREATE TABLE country (\n\t"index" BIGINT, \n\t... |
Basic Structure¶
SELECT DISTINCT value_expression AS alias
FROM tables AS alias
WHERE predicate
ORDER BY value_expression
Types¶
Character (Fixed width, variable width)
National Character (Fixed width, variable width)
Binary
Numeric (Exact, Arpproximate)
Boolean
DateTime
Interval
CHAR and NCHAR are vendor-dependent. Sometimes they mean the same thing, and sometimes CHAR means bytes and NCHAR means Unicode.
The SQL standard specifies that character strings and datetime literals are enclosed by single quotes. Two single quotes wihtin a string is intepreted as a literal single quote.
'Gilligan''s island'
The CAST function¶
CAST(X as CHARACTER(10))
Value expression¶
Literal
Column reference
Function
CASES
(Value expression)
which may be prefixed with unary operators -
and +
and combined with binary operators appropriate for the data type.
Literal
[6]:
%sql SELECT 23
* sqlite:///data/faculty.db
Done.
[6]:
23 |
---|
23 |
Column reference
[7]:
%sql SELECT first, last FROM person LIMIT 3
* sqlite:///data/faculty.db
Done.
[7]:
first | last |
---|---|
Aaron | Alexander |
Aaron | Kirby |
Abram | Allen |
Function
[8]:
%sql SELECT count(*) FROM person
* sqlite:///data/faculty.db
Done.
[8]:
count(*) |
---|
1523 |
Cases
[9]:
%%sql
SELECT first, last, age,
CASE
WHEN age < 50 THEN 'Whippernapper'
WHEN age < 70 THEN 'Old codger'
ELSE 'Dinosaur'
END comment
FROM person
LIMIT 4
* sqlite:///data/faculty.db
Done.
[9]:
first | last | age | comment |
---|---|---|---|
Aaron | Alexander | 54 | Old codger |
Aaron | Kirby | 59 | Old codger |
Abram | Allen | 41 | Whippernapper |
Abram | Boyer | 45 | Whippernapper |
Value expression
[10]:
%%sql
SELECT first || ' ' || last AS name, age, age - 10 AS fake_age
FROM person
LIMIT 3
* sqlite:///data/faculty.db
Done.
[10]:
name | age | fake_age |
---|---|---|
Aaron Alexander | 54 | 44 |
Aaron Kirby | 59 | 49 |
Abram Allen | 41 | 31 |
Bineary operators¶
Concatenation¶
A || B
Mathematical¶
A + B
A - B
A * B
A / B
Data and time arithmetic¶
'2018-08-29' + 3
'11:59' + '00:01'
[11]:
%%sql
SELECT DISTINCT language_name
FROM language
LIMIT 5;
* sqlite:///data/faculty.db
Done.
[11]:
language_name |
---|
PHP |
Clojure |
Dylan |
GNU Octave |
D |
Sorting¶
SELECT DISTINCT value_expression AS alias
FROM tables AS alias
ORDER BY value_expression
[12]:
%%sql
SELECT DISTINCT language_name
FROM language
ORDER BY language_name ASC
LIMIT 5;
* sqlite:///data/faculty.db
Done.
[12]:
language_name |
---|
ASP |
Assembly |
AutoIt |
Awk |
Bash |
[13]:
%%sql
SELECT DISTINCT language_name
FROM language
ORDER BY random()
LIMIT 5;
* sqlite:///data/faculty.db
Done.
[13]:
language_name |
---|
Fortran |
CoffeeScript |
Julia |
ECMAScript |
Awk |
Filtering¶
For efficiency, place the most stringent filters first.
SELECT DISTINCT value_expression AS alias
FROM tables AS alias
WHERE predicate
ORDER BY value_expression
Predicates for filtering rows¶
Comparison operators (=, <>, <, >, <=, >=)
BETWEEN start AND end
IN(A, B, C)
LIKE
IS NULL
REGEX
Use NOT prefix for negation
Combining predicates¶
AND
OR
USe parenthesis to indicate order of evaluation for compound statements.
[14]:
%%sql
SELECT first, last, age
FROM person
WHERE age BETWEEN 16 AND 17
LIMIT 5;
* sqlite:///data/faculty.db
Done.
[14]:
first | last | age |
---|---|---|
Antoine | Beard | 16 |
Augustine | Mejia | 16 |
Boris | Mejia | 16 |
Brain | Haney | 16 |
Burl | Mayo | 17 |
Joins¶
Joins combine data from 1 or more tables to form a new result set.
Note: To join on multiple columns just use AND
in the ON
expression
Natural join¶
Uses all common columns in Tables 1 and 2 for JOIN
FROM Table1
NATURAL INNER JOIN Table 2
Inner join¶
General form of INNER JOIN uisng ON
FROM Table1
INNER JOIN Table2
ON Table1.Column = Table2.Column
Note: This is equivalent to an EQUIJOIN but more flexible in that additional JOIN conditions can be specified.
SELECT *
FROM Table1, Table2
WHERE Table1.Column = Table2.Column
If there is a common column in both tables
FROM Table1
INNER JOIN Table2
USING Column
Joining more than two tables
From (Table1
INNER JOIN Table2
ON Table1.column1 = Table2.Column1)
INNER JOIN Table3
ON Table3.column2 = Table2.Column2
Outer join¶
General form of OUTER JOIN uisng ON
FROM Table1
RIGHT OUTER JOIN Table2
ON Table1.Column = Table2.Column
FROM Table1
LEFT OUTER JOIN Table2
ON Table1.Column = Table2.Column
FROM Table1
FULL OUTER JOIN Table2
ON Table1.Column = Table2.Column
[15]:
%%sql
SELECT first, last, language_name
FROM person
INNER JOIN person_language
ON person.person_id = person_language.person_id
INNER JOIN language
ON language.language_id = person_language.language_id
LIMIT 10;
* sqlite:///data/faculty.db
Done.
[15]:
first | last | language_name |
---|---|---|
Aaron | Alexander | Haskell |
Aaron | Kirby | GNU Octave |
Aaron | Kirby | haXe |
Aaron | Kirby | Falcon |
Abram | Allen | TypeScript |
Abram | Boyer | Io |
Abram | Boyer | Lua |
Abram | Boyer | Falcon |
Adan | Brown | F# |
Adolph | Dalton | Dart |
Set operations¶
SELECT a, b
FROM table1
SetOp
SELECT a, b
FROM table2
wehre SetOp is INTERSECT
, EXCEPT
, UNION
or UNION ALL
.
Union¶
UNION
UNION ALL (does not eliminate duplicate rows)
Difference¶
EXCEPT
Alternative using OUTER JOIN
with test for NULL
[16]:
%%sql
DROP VIEW IF EXISTS language_view;
CREATE VIEW language_view AS
SELECT first, last, language_name
FROM person
INNER JOIN person_language
ON person.person_id = person_language.person_id
INNER JOIN language
ON language.language_id = person_language.language_id
;
* sqlite:///data/faculty.db
Done.
Done.
[16]:
[]
[17]:
%%sql
SELECt *
FROM language_view
LIMIT 10;
* sqlite:///data/faculty.db
Done.
[17]:
first | last | language_name |
---|---|---|
Aaron | Alexander | Haskell |
Aaron | Kirby | GNU Octave |
Aaron | Kirby | haXe |
Aaron | Kirby | Falcon |
Abram | Allen | TypeScript |
Abram | Boyer | Io |
Abram | Boyer | Lua |
Abram | Boyer | Falcon |
Adan | Brown | F# |
Adolph | Dalton | Dart |
[18]:
%%sql
SELECt *
FROM language_view
WHERE language_name = 'Python'
UNION
SELECt *
FROM language_view
WHERE language_name = 'Haskell'
LIMIT 10;
* sqlite:///data/faculty.db
Done.
[18]:
first | last | language_name |
---|---|---|
Aaron | Alexander | Haskell |
Andree | Douglas | Haskell |
Arlie | Terrell | Python |
Boyd | Blackwell | Haskell |
Buck | Howe | Haskell |
Carlton | Richard | Haskell |
Carylon | Zamora | Python |
Clarisa | Rodgers | Python |
Dinorah | O'brien | Haskell |
Dorian | Lloyd | Haskell |
[19]:
%%sql
SELECt *
FROM language_view
WHERE language_name IN ('Python', 'Haskell')
ORDER BY first
LIMIT 10;
* sqlite:///data/faculty.db
Done.
[19]:
first | last | language_name |
---|---|---|
Aaron | Alexander | Haskell |
Andree | Douglas | Haskell |
Arlie | Terrell | Python |
Boyd | Blackwell | Haskell |
Buck | Howe | Haskell |
Carlton | Richard | Haskell |
Carylon | Zamora | Python |
Clarisa | Rodgers | Python |
Dinorah | O'brien | Haskell |
Dorian | Lloyd | Haskell |
Aggregate functions¶
COUNT
MIN
MAX
AVG
SUM
[20]:
%%sql
SELECT count(language_name)
FROM language_view;
* sqlite:///data/faculty.db
Done.
[20]:
count(language_name) |
---|
2297 |
Grouping¶
SELECT a, MIN(b) AS min_b, MAX(b) AS max_b, AVG(b) AS mean_b
FROM table
GROUP BY a
HAVING mean_b > 5
The HAVING
is analagous to the WHERE
clause, but filters on aggregate conditions. Note that the WHERE
statement filters rows BEFORE the grouping is done.
Note: Any variable in the SELECT part that is not an aggregte function needs to be in the GROUP BY part.
SELECT a, b, c, COUNT(d)
FROM table
GROUP BY a, b, c
[21]:
%%sql
SELECT language_name, count(*) AS n
FROM language_view
GROUP BY language_name
HAVING n > 45;
* sqlite:///data/faculty.db
Done.
[21]:
language_name | n |
---|---|
AutoIt | 61 |
Bash | 48 |
ECMAScript | 48 |
GNU Octave | 49 |
JavaScript | 48 |
Perl | 55 |
PowerShell | 50 |
Prolog | 50 |
The CASE switch¶
Simple CASE¶
SELECT name,
(CASE sex
WHEN 'M' THEN 1.5*dose
WHEN 'F' THEN dose
END) as adjusted_dose
FROM table
Searched CASE¶
SELECT name,
(CASE
WHEN sex = 'M' THEN 1.5*dose
WHEN sex = 'F' THEN dose
END) as adjusted_dose
FROM table
[22]:
%%sql
SELECT first, last, language_name,
(CASE
WHEN language_name LIKE 'H%' THEN 'Hire'
ELSE 'FIRE'
END
) AS outcome
FROM language_view
LIMIT 10;
* sqlite:///data/faculty.db
Done.
[22]:
first | last | language_name | outcome |
---|---|---|---|
Aaron | Alexander | Haskell | Hire |
Aaron | Kirby | GNU Octave | FIRE |
Aaron | Kirby | haXe | Hire |
Aaron | Kirby | Falcon | FIRE |
Abram | Allen | TypeScript | FIRE |
Abram | Boyer | Io | FIRE |
Abram | Boyer | Lua | FIRE |
Abram | Boyer | Falcon | FIRE |
Adan | Brown | F# | FIRE |
Adolph | Dalton | Dart | FIRE |
User defined functions (UDF)¶
[23]:
import sqlite3
[24]:
import random
import statistics
[25]:
con = sqlite3.connect(":memory:")
[26]:
con.create_function("rnorm", 2, random.normalvariate)
[27]:
cr = con.cursor()
[28]:
cr.execute('CREATE TABLE foo(num REAL);')
[28]:
<sqlite3.Cursor at 0x1165811f0>
[29]:
cr.execute("""
INSERT INTO foo(num)
VALUES
(rnorm(0,1)),
(rnorm(0,1)),
(rnorm(0,1)),
(rnorm(0,1)),
(rnorm(0,1)),
(rnorm(0,1)),
(rnorm(0,1)),
(rnorm(0,1))
""")
[29]:
<sqlite3.Cursor at 0x1165811f0>
[30]:
cr.execute('SELECT * from foo')
cr.fetchall()
[30]:
[(1.951537604398419,),
(1.5075295059349978,),
(-2.118904989108828,),
(0.2542001383796541,),
(0.411832441172589,),
(1.1235289209771866,),
(0.9641950352497476,),
(0.3895096522469744,)]
[31]:
class Var:
def __init__(self):
self.acc = []
def step(self, value):
self.acc.append(value)
def finalize(self):
if len(self.acc) < 2:
return 0
else:
return statistics.variance(self.acc)
[32]:
con.create_aggregate("Var", 1, Var)
[33]:
cr.execute('SELECT Var(num) FROM foo')
cr.fetchall()
[33]:
[(1.5194560216303632,)]
[34]:
con.close()