SQL
Creating new database
It is time to create our own database (and run our first DDL (Data Definition Language) statement):
CREATE DATABASE mydb; -- You can pick any name you want.
Let’s check that we have create a new database successfully:
\l
You should now see your database in the list. To make sure that we perform all further operations in our new database let’s switch to it:
\connect mydb;
Now we are ready to practice our first SQL commands.
Basic SQL commands
Our new database is completely empty. We can check that by listing all tables:
\dt
— You should see ‘Did not find any relations.’ confirming that no relations (tables) exist in your database.
Let’s start by creating our first table:
CREATE TABLE firsttable (
id INT,
text VARCHAR(14)
);
Let’s confirm that new table was added:
\dt
— Now you should see your table in the list.
Schema | Name | Type | Owner |
---|---|---|---|
public | firsttable | table | postgres |
We can see what data is in our relation:
SELECT * FROM firsttable;
You can view general information about the tables using this command:
SELECT
table_schema,
table_name,
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale,
is_nullable
FROM information_schema.columns
WHERE table_name = 'firsttable'
ORDER BY ordinal_position;
This query will return information about each column in the firsttable table, such as the column’s name, its data type, maximum length for character-based types, numeric precision and scale for numeric types, and whether the column can contain NULL values. We can put some data into our table using INSERT INTO statement:
INSERT INTO firsttable (id, text) VALUES (1, 'anything');
Postgres gives INSERT 0 1 to tell you that 1 row has been inserted. The 0 is because of a deprecated behavior with OID. You can see more details about this here . You can check that the data was added by running the SELECT statement again. Another DML (Data Manipulation Language) statement is UPDATE which allows us to modify data:
UPDATE firsttable
SET text = 'somethingelse'
WHERE id = 1;
Run the SELECT statement again to ensure that update was performed successfully. Finally, last DML statement we will learn about today is DELETE:
DELETE FROM firsttable;
Now our table should be empty. Note, that DELETE only deleted all data from our relation, but not the relation itself. To remove our relation we need to run the DDL DROP command:
DROP TABLE firsttable;
You can practise more by creating more relations, inseting, modyfing, and deleting data from them.
Data types
PostgreSQL support multiple data types. Let’s try to create the following relation:
CREATE TABLE employees (
emp_no INT,
birth_date DATE,
name VARCHAR(14),
salary NUMERIC(10,2),
permanent BOOLEAN,
PRIMARY KEY (emp_no)
);
Now let’s put some data into this relation:
INSERT INTO employees VALUES (1, '1984-11-02', 'Max Maxim', 10000.00 , true);
-- Invalid type for column emp_no
INSERT INTO employees VALUES ('A', '1983-02-21', 'Artem Nik', 2000.00 , false);
-- Implicit type conversion conversion
INSERT INTO employees VALUES ('2', '1983-02-21', 'Artem Nik', 2000.00 , false);
-- Value for first_name is too long
INSERT INTO employees
VALUES (3, '1983-02-21', 'QWERTYUIOPASDFGHJKLL', 3000.00 , false);
There are many datatypes that are supported by PostgreSQL. You can find the full list here: https://www.postgresql.org/docs/current/datatype.html
NULL vs NOT NULL, DEFAULT
Now let’s try to create some more realistic relations:
CREATE TABLE professor (
pers_nr INT,
name VARCHAR(30) NOT NULL,
level CHAR(2) default 'AP',
department CHAR(2) NULL,
PRIMARY KEY (pers_nr)
);
There are several new concepts used in our new relation. Let’s see what they mean. First let’s see how DEFAULT works:
-- Adds professor tuple
INSERT INTO professor VALUES (1, 'Artem', 'AP', 'CS');
-- Adds other tuple. Since value for level is not provided - DEFAULT value would be used
INSERT INTO professor (pers_nr, name, department)
VALUES (2, 'Olena', 'CS');
-- In this case value for department is not provided.
-- Since there is no default value - value would be set to NULL
INSERT INTO professor VALUES (3, 'Olaf');
SELECT * FROM professor;
Now let’s play with NULL vs NOT NULL columns:
-- Level is set to NULL (not default value)
INSERT INTO professor VALUES (4, 'Ge', NULL, 'CS');
-- Would return an error, since name is set to NOT NULL and no value is provided
INSERT INTO professor (pers_nr, level, department)
VALUES (5, 'AP', 'CS');
SELECT * FROM professor;
PRIMARY KEY
You have probably noticed that our relation ‘professor’ has a PRIMARY KEY. Let’s work with it:
-- Adds new tuple
INSERT INTO professor VALUES (7, 'Martin', NULL, 'CM');
-- Returns error cause of duplicate key
INSERT INTO professor VALUES (7, 'Sam', NULL, 'CM');
-- Returns error cause PRIMARY KEY can't be null
INSERT INTO professor VALUES (NULL, 'Sam', NULL, 'CM');
-- Returns error cause PRIMARY KEY can't be null
INSERT INTO professor (name, level, department)
VALUES ('Sam', NULL, 'CM');
SELECT * FROM professor;
Sometimes you may want the PRIMARY KEY to be automatically generated by DBMS:
-- Let's remove our relation
DROP TABLE professor;
-- Create new table
CREATE TABLE professor (
pers_nr SERIAL, -- Notice change in this column
name VARCHAR(30) NOT NULL,
level CHAR(2) DEFAULT 'AP',
department CHAR(2) NULL,
PRIMARY KEY (pers_nr)
);
-- Adding data to our table
INSERT INTO professor VALUES ('Artem', 'AP', 'CS');
INSERT INTO professor (name, department) VALUES ('Olena', 'CS');
INSERT INTO professor VALUES ('Olaf');
INSERT INTO professor VALUES ('Ge', NULL, 'CS');
INSERT INTO professor VALUES ('Martin', NULL, 'CM');
-- As you can see, pers_nr was automatically generated
SELECT * FROM professor;
In some cases, PRIMARY KEY can contain several columns:
CREATE TABLE phone_book (
country_code CHAR(3),
phone CHAR(10),
name VARCHAR(20),
PRIMARY KEY (country_code, phone)
);
-- Adding first record
INSERT INTO phone_book (country_code, phone, name)
VALUES ('41', '0791111111', 'A');
-- Added without error, cause country_code is different
INSERT INTO phone_book (country_code, phone, name)
VALUES ('42', '0791111111', 'B');
-- Error, cause combination (country_code, phone) is same as in already existing record
INSERT INTO phone_book (country_code, phone, name)
VALUES ('41', '0791111111', 'D');
SELECT * FROM phone_book;
Note: Another syntax to define one-column PRIMARY KEY:
CREATE TABLE professor (
pers_nr INT PRIMARY KEY,
name VARCHAR(30) NOT NULL,
level CHAR(2) default 'AP',
department CHAR(2) NULL
);
ALTER TABLE
You may want to modify an existing relation without re-creating it (without losing existing data):
-- Adding new column to existing table
ALTER TABLE professor ADD COLUMN age INT;
-- Removing column
ALTER TABLE professor DROP COLUMN age;
-- Won't work, cause there would be no values for new NOT NULL column
ALTER TABLE professor ADD COLUMN age INT NOT NULL;
-- But this would work, since we now provided default value
ALTER TABLE professor ADD COLUMN age INT NOT NULL DEFAULT 0;
-- Chaning lenght of the column
ALTER TABLE professor ALTER COLUMN name TYPE VARCHAR(25);
WHERE in UPDATE and DELETE statements
You already know how to delete all records in the table. While this may be useful sometimes, most of the time you would want to delete only particular records:
-- Delete record with pers_nr = 7
DELETE FROM professor
WHERE pers_nr = 7;
-- Delete all records from 'CM' department with level 'AP'
DELETE FROM professor
WHERE department = 'CM' AND level = 'AP';
The same is true for the UPDATE statement:
-- Update all records with level = 'AP'
UPDATE professor
SET department = 'CS'
WHERE level = 'AP';
-- Set level to 'AP' for all records where level is NULL now
UPDATE professor
SET level = 'AP'
WHERE level is NULL;
To practice writting queries we will need some tables:
DROP TABLE IF EXISTS professor;
CREATE TABLE professor (
prof_nr INT,
name VARCHAR(30),
level CHAR(2) default 'AP',
room INT,
PRIMARY KEY (prof_nr)
);
DROP TABLE IF EXISTS student;
CREATE TABLE student (
pers_nr INT,
name VARCHAR(100),
semester INT,
PRIMARY KEY (pers_nr)
);
DROP TABLE IF EXISTS lecture;
CREATE TABLE lecture (
lecture_id INT,
title VARCHAR(100),
cp INT,
prof_nr INT,
PRIMARY KEY (lecture_id)
);
DROP TABLE IF EXISTS requires;
CREATE TABLE requires (
prerequisite INT,
followup INT,
PRIMARY KEY (prerequisite, followup)
);
DROP TABLE IF EXISTS attends;
CREATE TABLE attends (
pers_nr INT,
lecture_id INT,
PRIMARY KEY (pers_nr, lecture_id)
);
DROP TABLE IF EXISTS tests;
CREATE TABLE tests (
pers_nr INT,
lecture_id INT,
grade INT,
PRIMARY KEY (pers_nr, lecture_id)
);
-- Add some data
INSERT INTO professor (prof_nr, name, level, room)
VALUES (2125, 'John', 'AP', 226),
(2126, 'David', 'FP', 232),
(2127, 'Anna', 'FP', 310);
SELECT
The SQL SELECT statement is used to retrieve records from one or more tables in your SQL database. The records retrieved are known as a result set. You are already familiar with the simplest select:
-- Simplest SELECT statement
SELECT * FROM professor;
It is actually not recommended to use * in SELECT statements in application development cause of potential issues if new columns are later added to the table. Instead, you can specify the exact list of columns that you want to get (this is also called projection):
SELECT prof_nr, name
FROM professor;
WHERE
You may want to filter (this is also called selection) the data you want to see. The SQL WHERE clause is used to filter the results and apply conditions in a SELECT, INSERT, UPDATE, or DELETE statement:
SELECT prof_nr, name
FROM professor
WHERE level = 'FP';
SELECT prof_nr, name
FROM professor
WHERE level = 'FP' AND room = 310; -- Any logical expression is supported
Aliases
Inside SELECT statement you can not only use the columns of a table, but also any arithmetic expression that you want:
SELECT prof_nr, level || ' ' || name, room + 100, 'fixed text'
FROM professor
WHERE level = 'FP';
You may want to give names to new columns by using aliases:
SELECT prof_nr, level || ' ' || name AS new_name,
room + 100 AS changed_room, 'fixed text' AS useless_column
FROM professor
WHERE level = 'FP';
You also can give table an alias. Would be useful when we would use several tables in one query:
SELECT p.prof_nr, p.level
FROM professor p
WHERE level = 'FP';
Operations with NULL
NULL has a special place in DBMS logic. Remember that:
Arithmetic(NULL + 1) -> NULL (NULL * 0) -> NULL
Comparisons(NULL = NULL) -> Unknown (NULL < 13) -> Unknown (NULL > NULL) -> Unknown
3-value Logic NOTNOT True -> False NOT Unknown -> Unknown NOT False -> True
ANDTrue AND True -> True True AND Unknown -> Unknown True AND FALSE -> FALSE Unknown AND Unknown -> Unknown Unknown AND FALSE -> FALSE
ORTrue OR True -> True True OR Unknown -> True True OR FALSE -> True Unknown OR Unknown -> Unknown Unknown OR FALSE -> Unknown
Let’s see it in practice:
INSERT INTO lecture (lecture_id, title, cp, prof_nr)
VALUES (1, 'Databases', 4, NULL),
(2, 'Networks', 4, 2126),
(3, 'Algebra', NULL, 2127);
-- We would not get Algebra, bacause result of cp >= 4 OR cp < 4 is Unknown
SELECT *
FROM lecture
WHERE cp >= 4 OR cp < 4;
-- Notice the new_cp value for Algebra
SELECT title, cp + 100 AS new_cp
FROM lecture;
-- You can check if some value is NULL or NOT NULL
SELECT *
FROM lecture
WHERE cp IS NULL and prof_nr IS NOT NULL;
Type conversion
Type coversion can be implicit and explicit. Implicit means that the database engine will convert the data type automatically, a process invisible to the user. Explicit means that you must specify how the data type should be converted:
SELECT prof_nr, name,
room + '100' AS room1, -- converted to int
room || '100' AS room2, -- converted to string
CAST (room AS VARCHAR(3)) AS room3 -- explicit conversion
FROM professor;
UNION, INTERSECT, EXCEPT
Add some data first:
INSERT INTO student (pers_nr, name, semester)
VALUES (24002, 'Peter', 8),
(25403, 'Mary', 8),
(26120, 'Anna', 8);
Now we can see how the different operations work. The UNION operator is used to combine the result-set of two or more SELECT statements. The UNION operator selects only distinct values by default:
-- Returns 5 names (no duplicate Anna)
(SELECT name FROM student)
UNION
(SELECT name FROM professor);
To allow duplicate values, use UNION ALL:
-- Returns 6 names (two Anna)
(SELECT name FROM student)
UNION ALL
(SELECT name FROM professor);
The SQL INTERSECT operator is used to return the results of 2 or more SELECT statements. However, it only returns the rows selected by all queries or data sets. If a record exists in one query and not in the other, it will be omitted from the INTERSECT results:
-- Only returns Anna
(SELECT name FROM student)
INTERSECT
(SELECT name FROM professor);
The SQL EXCEPT operator is used to return all rows in the first SELECT statement that are not returned by the second SELECT statement. Each SELECT statement will define a dataset. The EXCEPT operator will retrieve all records from the first dataset and then remove from the results all records from the second dataset:
-- And now we got Mary and Peter
(SELECT name FROM student)
EXCEPT
(SELECT name FROM professor);
DISTINCT
The SELECT DISTINCT statement is used to return only distinct (different) values. Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values:
-- Returns 3 records. Two are the same.
SELECT level
FROM professor;
-- Now duplicates are removed
SELECT DISTINCT level
FROM professor;
-- Again 3 records, cause there are no full duplicated anymore
SELECT DISTINCT level, name
FROM professor;
ORDER BY
Tables are not sorted by default; if the result needs to be sorted, the query needs to specify it explicitly. The ORDER BY keyword is used to sort the result-set in ascending or descending order. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in descending order, use the DESC keyword:
SELECT prof_nr, name, level, room
FROM professor
ORDER BY level ASC, name DESC;
LIMIT
The SELECT LIMIT statement is used to retrieve records from one or more tables in PostgreSQL and limit the number of records returned based on a limit value:
SELECT prof_nr, name, level, room
FROM professor
ORDER BY level ASC, name DESC
LIMIT 2;
Limit without sorting can be used to sample the data, but since order of records is not guaranteed (unless specified by ORDER BY), you should never rely on particular order or that same query would return same results.
GROUP BY
The GROUP BY statement groups rows that have the same values into summary rows, like “find the number of customers in each country”. The GROUP BY statement is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or more columns:
-- Returns level and count of records per each level
SELECT level, COUNT(*) as count
FROM professor
GROUP BY level;
-- Returns total count of records
SELECT COUNT(*)
FROM professor;
NULL in aggregations
CREATE TABLE employee (
empl_id SERIAL PRIMARY KEY,
name VARCHAR(20),
department VARCHAR(10),
salary INT
);
INSERT INTO employee (name, department, salary)
VALUES ('John Smith', 'IT', 2000),
('Mary Popins', 'FINANCES', 2000),
('Joan Piquet', 'IT', 2500),
('Jose Gomez', 'IT', NULL);
-- Shows 3 records in IT, cause count(*) counts all records
SELECT department, count(*)
FROM employee
GROUP BY department;
-- Shows only 2 records in IT, cause count(field) only counts records
-- where value is not NULL
SELECT department, count(salary)
FROM employee
GROUP BY department;
-- NULL value is ignored during sum
SELECT department, sum(salary)
FROM employee
GROUP BY department;
HAVING
The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions:
SELECT department, count(*)
FROM employee
GROUP BY department
HAVING COUNT(*) > 1; -- would return only IT
More advanced aggregations
We’ll create a table called sales with columns for the date of the sale, the item sold, the quantity sold, and the price per unit:
CREATE TABLE sales (
sale_date DATE,
item_name TEXT,
quantity INT,
price NUMERIC(10, 2)
);
Now let’s insert some data into the table:
INSERT INTO sales (sale_date, item_name, quantity, price)
VALUES
('2022-02-01', 'Apple', 5, 1.25),
('2022-02-01', 'Orange', 3, 0.75),
('2022-02-01', 'Banana', 2, 0.50),
('2022-02-02', 'Apple', 10, 1.25),
('2022-02-02', 'Orange', 6, 0.75),
('2022-02-02', 'Banana', 4, 0.50),
('2022-02-03', 'Apple', 3, 1.25),
('2022-02-03', 'Orange', 2, 0.75),
('2022-02-03', 'Banana', 1, 0.50);
To get the total revenue for each item, we can use the GROUP BY clause along with the SUM function and some basic arithmetic:
SELECT item_name, SUM(quantity * price) AS total_revenue
FROM sales
GROUP BY item_name;
To get the total number of sales for each day of the week (assuming sale_date is a DATE column):
SELECT DATE_TRUNC('week', sale_date) AS week_start_date,
DATE_TRUNC('day', sale_date) AS sale_date,
COUNT(1) AS total_sales
FROM sales
GROUP BY 1, 2
ORDER BY 1, 2;
This query uses the DATE_TRUNC function to group sales by the start of the week and the sale date. The COUNT(1) function is used to count the number of sales in each group (including NULLS). COUNT(1) and COUNT(*) are semantically equivalent, the only difference is that COUNT(*) also counts the NULL values. You can read more information about the differences here .
Get the average price and quantity sold for each item, and the total number of sales:
SELECT item_name, AVG(price) AS avg_price,
AVG(quantity) AS avg_quantity, COUNT(*) AS num_sales
FROM sales
GROUP BY item_name
ORDER BY num_sales DESC;
This query groups the sales by item using the GROUP BY clause. The AVG() function is used to calculate the average price and quantity sold for each item, and the COUNT() function is used to count the total number of sales for each item. The ORDER BY clause sorts the results by total number of sales in descending order.
JOINs
Let us first populate the tables with the following data:
DROP TABLE IF EXISTS professor;
CREATE TABLE professor (
prof_nr INT,
name VARCHAR(30),
level CHAR(2) default 'AP',
room INT,
PRIMARY KEY (prof_nr)
);
DROP TABLE IF EXISTS student;
CREATE TABLE student (
pers_nr INT,
name VARCHAR(100),
semester INT,
PRIMARY KEY (pers_nr)
);
DROP TABLE IF EXISTS lecture;
CREATE TABLE lecture (
lecture_id INT,
title VARCHAR(100),
cp INT,
prof_nr INT,
PRIMARY KEY (lecture_id)
);
DROP TABLE IF EXISTS requires;
CREATE TABLE requires (
prerequisite INT,
followup INT,
PRIMARY KEY (prerequisite, followup)
);
DROP TABLE IF EXISTS attends;
CREATE TABLE attends (
pers_nr INT,
lecture_id INT,
PRIMARY KEY (pers_nr, lecture_id)
);
DROP TABLE IF EXISTS tests;
CREATE TABLE tests (
pers_nr INT,
lecture_id INT,
grade FLOAT,
PRIMARY KEY (pers_nr, lecture_id)
);
-- Add some data
INSERT INTO professor (prof_nr, name, level, room)
VALUES (2125, 'John', 'AP', 226),
(2126, 'David', 'FP', 232),
(2127, 'Anna', 'FP', 310);
INSERT INTO student (pers_nr, name, semester)
VALUES (24002, 'Peter', 8),
(25403, 'Mary', 8),
(26120, 'Anna', 8);
INSERT INTO lecture (lecture_id, title, cp, prof_nr)
VALUES (1, 'Databases', 4, NULL),
(2, 'Networks', 4, 2126),
(3, 'Algebra', NULL, 2127),
(4, 'Cloud', 4, 2126),
(5, 'Art', 4, 2126),
(6, 'History', 5, 2127);
INSERT INTO attends (pers_nr, lecture_id)
VALUES (24002, 1), (24002, 2), (24002, 3),
(25403, 1), (25403, 2),
(26120, 2), (26120, 3);
INSERT INTO tests (pers_nr, lecture_id, grade)
VALUES (24002, 1, 5.0), (24002, 2, 4.75), (24002, 3, 5.5),
(25403, 1, 5.25), (25403, 2, 6.0),
(26120, 2, 4.5), (26120, 3, 4.0);
INSERT INTO requires (prerequisite, followup)
VALUES (2, 1), (2, 4), (5, 6);
A JOIN clause is used to combine rows from two or more tables, based on a related column between them. Here are the different types of the JOINs in SQL:
- CROSS JOIN: A cartesian product of the two tables.
- (INNER) JOIN: Returns records that have matching values in both tables
- LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table
- RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table
- FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table
Cross product
-- Cross product (or CROSS JOIN)
-- We get all combinations of professor-lecture pairs.
SELECT name, title
FROM professor p,
lecture l;
-- It is also called CROSS JOIN, and following SQL would produce same result
SELECT name, title
FROM professor p
CROSS JOIN lecture l;
-- By itself is used rare, only in cases all combinations are required.
-- By becomes much more usefull is used with WHERE.
-- Now we get names of professors and lecture that they are responsible for.
SELECT name, title
FROM professor p,
lecture l
WHERE p.prof_nr = l.prof_nr;
INNER JOIN
Inner Join (keep only matched tuples):
-- Returns 2 rows
-- Note, that join condition is moved to ON (instead of WHERE)
SELECT name, title
FROM professor p
INNER JOIN lecture l ON p.prof_nr = l.prof_nr;
-- Find the average grade for each lecture
SELECT l.title AS lecture_title, AVG(t.grade) AS average_grade
FROM lecture l
INNER JOIN tests t ON l.lecture_id = t.lecture_id
GROUP BY l.title;
-- Find the average grade for each student
SELECT s.name, AVG(t.grade) AS average_grade
FROM student s
INNER JOIN tests t ON s.pers_nr = t.pers_nr
GROUP BY s.pers_nr, s.name;
-- observe that each non-aggregated column from select (such as s.name in this case) should appear in GROUP BY.
LEFT OUTER JOIN
Another type of join is called a LEFT OUTER JOIN. This type of join returns all rows from the LEFT-hand table specified in the ON condition and only those rows from the other table where the joined fields are equal (join condition is met):
-- Returns 3 rows, because now we also have John who has no lectures.
-- Keyword OUTER can be skipped. Would return same result
SELECT name, title
FROM professor p
LEFT JOIN lecture l ON p.prof_nr = l.prof_nr;
RIGHT OUTER JOIN
Another type of join is called a RIGHT OUTER JOIN. This type of join returns all rows from the RIGHT-hand table specified in the ON condition and only those rows from the other table where the joined fields are equal (join condition is met). It is basically symmetrical to LEFT OUTER JOIN:
-- Returns 3 rows. Now we do not have John (as with LEFT JOIN),
-- but have Databases (which do not have any professor attached to it).
-- Keyword OUTER can be skipped. Would return same result
SELECT name, title
FROM professor p
RIGHT JOIN lecture l ON p.prof_nr = l.prof_nr;
FULL OUTER JOIN
Another type of join is called a PostgreSQL FULL OUTER JOIN. This type of join returns all rows from the LEFT-hand table and RIGHT-hand table with nulls in place where the join condition is not met:
-- Returns 4 rows. Now we have both John (with no lecture attached)
-- and Databases (with no professor attached)
-- Keyword OUTER can be skipped. Would return same result
SELECT name, title
FROM professor p
FULL JOIN lecture l ON p.prof_nr = l.prof_nr;
NATURAL JOIN, USING
NATURAL JOIN automatically detects JOIN condition (by assuming that all fields with same names in tables we join should be a match). Natural Joins are supported by some DBMS systems, but not all of them and are usually not used in practise. Natural Joins can be INNER, LEFT or RIGHT (more on that later):
-- No ON condition is specified since it is built automatically.
SELECT name, title
FROM professor p
NATURAL INNER JOIN lecture l;
Another syntax to write JOIN is using the USING statement. While NATURAL JOIN automatically selects columns to JOIN, it is possible to specify the column (by not the full condition). All 3 SELECTs would return the same result (difference is in syntax only.):
-- No ON condition is specified since it is built automatically.
SELECT name, title
FROM professor p
NATURAL INNER JOIN lecture l;
-- Columns are listed, but equality condition is assumed.
SELECT name, title
FROM professor p
INNER JOIN lecture l USING (prof_nr);
SELECT name, title
FROM professor p
INNER JOIN lecture l ON p.prof_nr = l.prof_nr; -- Fully specifing ON condition
However, you need to be careful with NATURAL JOINS:
DROP TABLE IF EXISTS company;
DROP TABLE IF EXISTS employee;
CREATE TABLE company(
company_id INT PRIMARY KEY,
name VARCHAR(30),
tax_code VARCHAR(10)
);
CREATE TABLE employee(
employee_id INT PRIMARY KEY,
name VARCHAR(30),
tax_code VARCHAR(30),
permanent INT, -- 0 - temporary, 1 - permanent
company_id INT
);
INSERT INTO company(company_id, name, tax_code)
VALUES (1, '2 Permanent', '100'),
(2, '1 Permanent, 1 Temp', '200'),
(3, '2 Temp', '300');
INSERT INTO employee (employee_id, name, tax_code, permanent, company_id)
VALUES (1, 'E1', '001', 1, 1),
(2, 'E2', '002', 1, 1),
(3, 'E3', '003', 1, 2),
(4, 'E4', '004', 0, 2),
(5, 'E5', '005', 0, 3),
(6, 'E6', '006', 0, 3);
-- Assume that we want to return all company names and number of permanent employees
-- This won't work, because NATURAL JOIN will try to
-- join tables on company_id, name and tax_code.
SELECT c.name, count(e.employee_id)
FROM company c
NATURAL LEFT JOIN employee e
WHERE e.permanent = 1
GROUP BY c.name;
-- Now JOIN is made using correct column, but in the result,
-- we are missing the companies with only Temporary employees.
-- This works because column name is the same.
-- If name of column on Company table was just "id" - this wont' work.
SELECT c.name, count(e.employee_id)
FROM company c
LEFT JOIN employee e USING (company_id)
WHERE e.permanent = 1
GROUP BY c.name;
-- In the previous example, the result of the left join will contain all companies,
-- but the where clause would filter out the companies without permanent employees
-- The order of execution:
-- FROM and/or JOIN clause.
-- WHERE clause.
-- GROUP BY clause.
-- HAVING clause.
-- SELECT clause.
-- DISTINCT clause.
-- ORDER BY clause.
-- LIMIT and/or OFFSET clause.
-- We can fix last query by extending ON condition.
SELECT c.name, count(e.employee_id)
FROM company c
LEFT JOIN employee e ON c.company_id = e.company_id AND e.permanent = 1
GROUP BY c.name;
Note, that we could have got correct result in many other ways. Last query was used just to demonstate the flexibility of full ON condition. Let’s now see a more advanced query using an inner join:
-- Rank students by their average grade in descending order
SELECT s.name, AVG(t.grade) AS avg_grade, RANK()
OVER (ORDER BY AVG(t.grade) DESC) AS rank
FROM student s
INNER JOIN tests t ON s.pers_nr = t.pers_nr
GROUP BY s.pers_nr, s.name;
RANK() OVER (ORDER BY AVG(t.grade) DESC) AS rank
: This is a window function that calculates the rank for each student based on their average grade. The RANK() function assigns a unique rank to each distinct value in the specified column, with the same rank for the same values. The OVER (ORDER BY AVG(t.grade) DESC) clause specifies the order in which the ranking will be calculated. In this case, it orders the students by their average grade in descending order (highest average grade first). The rank will be assigned accordingly.
Self-joins
Get a list of professors sharing the same room:
SELECT p1.name AS professor1, p2.name AS professor2, p1.room
FROM professor p1
INNER JOIN professor p2 ON p1.room = p2.room AND p1.prof_nr <> p2.prof_nr;
Subqueries
Subquery is a query within a query. You can create subqueries within your SQL statements. These subqueries can reside in the WHERE clause, the FROM clause, or the SELECT clause.
Subquery in FROM
A subquery can also be found in the FROM clause. These are also called inline views.
-- Inner query returns only lectures with title Networks.
-- l is an alias given to that query results.
SELECT p.name, l.title
FROM professor p
LEFT JOIN (SELECT * FROM lecture WHERE title = 'Networks') l
ON p.prof_nr = l.prof_nr;
-- A simpler example of a subquery in a FROM clause:
SELECT * FROM (SELECT name AS p2_name FROM professor) AS p2;
Subquery in SELECT
A subquery can also be found in the SELECT clause. These are generally used when you wish to retrieve a calculation using an aggregate function such as the sum, count, min, max , or avg function, but you do not want the aggregate function to apply to the main query.
-- Adding some more data
-- Returns the name of professor and the count of lectures
SELECT p.name, (SELECT count(title) FROM lecture l
WHERE p.prof_nr = l.prof_nr) AS num_of_lectures
FROM professor p;
-- Alternatively, this can be computed using the left join:
SELECT p.name, COUNT(l.title) AS num_of_lectures
FROM professor p
LEFT JOIN lecture l ON p.prof_nr = l.prof_nr
GROUP BY p.name;
Note, that in this case subquery should only return 1 row or error would be thrown.
Subquery in WHERE
Finally, most often, the subquery will be found in the WHERE clause. These subqueries are also called nested subqueries:
-- Returns names of professor with lectures with cp >= 4
SELECT p.name
FROM professor p
WHERE p.prof_nr IN (SELECT prof_nr FROM lecture WHERE cp >= 4);
-- Note however, that we could have also used the inner join in this case:
SELECT DISTINCT p.name
FROM professor p
JOIN lecture l ON p.prof_nr = l.prof_nr
WHERE l.cp >= 4;
-- in that case, the DISTINCT is necessary, to avoid repeating the same professor
-- in case it matches with multiple lectures fulfilling the condition
Other usefull statements that can be used with subqueries are EXIST (NOT EXIST), ANY, ALL, SOME.
WITH
Another way to use subqueries is by using WITH statement. The SQL WITH clause provides a way of defining a temporary relation whose definition is available only to the query in which the WITH clause occurs, SQL WITH clauses is individually-evaluated SELECT statements for use in a larger container query. Using SQL WITH clauses we can simplify complicated queries and reduce statement repetition:
-- Subquery would return lecture data and count of students attending
WITH lecture_attends AS (
SELECT l.lecture_id, l.prof_nr, count(a.pers_nr) as attends_count
FROM lecture l
LEFT JOIN attends a ON a.lecture_id = l.lecture_id
GROUP BY l.lecture_id, l.prof_nr
)
SELECT p.name, SUM(l.attends_count)
FROM professor p
LEFT JOIN lecture_attends l ON l.prof_nr = p.prof_nr -- Now we can use subquery as table
GROUP BY p.prof_nr;
Find the total credit points earned by each student:
WITH student_cp AS (
SELECT s.pers_nr, SUM(l.cp) AS total_cp
FROM student s
INNER JOIN attends a ON s.pers_nr = a.pers_nr
INNER JOIN lecture l ON a.lecture_id = l.lecture_id
GROUP BY s.pers_nr
)
SELECT s.name, student_cp.total_cp
FROM student s
INNER JOIN student_cp ON s.pers_nr = student_cp.pers_nr
Find the top 5 students with the highest average grade and display their names, average grades, and the number of attended lectures:
WITH student_avg_grade AS (
SELECT s.pers_nr, s.name, AVG(t.grade) AS avg_grade
FROM student s
INNER JOIN tests t ON s.pers_nr = t.pers_nr
GROUP BY s.pers_nr, s.name
),
student_lecture_count AS (
SELECT s.pers_nr, COUNT(a.lecture_id) AS lecture_count
FROM student s
INNER JOIN attends a ON s.pers_nr = a.pers_nr
GROUP BY s.pers_nr
)
SELECT sag.name, sag.avg_grade, slc.lecture_count
FROM student_avg_grade sag
INNER JOIN student_lecture_count slc ON sag.pers_nr = slc.pers_nr
ORDER BY sag.avg_grade DESC
LIMIT 5;
Finally let’s see a couple more window functions:
-- Assign a row number to students based on their total
-- credit points earned, in descending order
WITH student_cp AS (
SELECT s.pers_nr, SUM(l.cp) AS total_cp
FROM student s
INNER JOIN attends a ON s.pers_nr = a.pers_nr
INNER JOIN lecture l ON a.lecture_id = l.lecture_id
GROUP BY s.pers_nr
)
SELECT s.name, student_cp.total_cp, ROW_NUMBER()
OVER (ORDER BY student_cp.total_cp DESC) AS row_number
FROM student s
INNER JOIN student_cp ON s.pers_nr = student_cp.pers_nr;
-- if we want to ensure the ordering is fixed, we can also sort by the student name, in addition to the total credit points:
WITH student_cp AS (
SELECT s.pers_nr, SUM(l.cp) AS total_cp
FROM student s
INNER JOIN attends a ON s.pers_nr = a.pers_nr
INNER JOIN lecture l ON a.lecture_id = l.lecture_id
GROUP BY s.pers_nr
)
SELECT s.name, student_cp.total_cp, ROW_NUMBER()
OVER (ORDER BY student_cp.total_cp DESC, s.name ASC) AS row_number
FROM student s
INNER JOIN student_cp ON s.pers_nr = student_cp.pers_nr;
In this query, the ROW_NUMBER() window function assigns a unique row number to each student based on their total credit points earned. The students are ordered by their total credit points in descending order (highest credit points first), and row numbers are assigned accordingly.
-- Retrieve the 3th to 5th students ordered by their names
SELECT s.name
FROM student s
ORDER BY s.name
LIMIT 2 OFFSET 2;
-- Note that this returns only 1 value, since we have only 3 students
In this query, the OFFSET function is used in conjunction with the LIMIT clause to paginate the result set. It orders the students by their names and retrieves the 3th to 5th students. The OFFSET value specifies the starting point of the result set, and the LIMIT value specifies the number of rows to retrieve.
VIEW
In case you want your query to be available permanently, you can create a VIEW based on that query. In SQL, a view is a virtual table based on the result set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database. You can add SQL statements and functions to a view and present the data as if the data were coming from one single table. A view is created with the CREATE VIEW statement:
-- Let's say we want to see all dependencies in lectures.
-- This can be obtained using the following query:
SELECT l1.title AS prerequisite, l2.title AS followup
FROM lecture l1
INNER JOIN requires r ON l1.lecture_id = r.prerequisite
LEFT JOIN lecture l2 ON l2.lecture_id = r.followup;
-- If we want to reuse this query later, we can create VIEW based on it.
CREATE VIEW dependencies_view
AS
SELECT l1.title AS prerequisite, l2.title AS followup
FROM lecture l1
INNER JOIN requires r ON l1.lecture_id = r.prerequisite
LEFT JOIN lecture l2 ON l2.lecture_id = r.followup;
-- And now use this VIEW as if this was a table.
SELECT *
FROM dependencies_view
WHERE prerequisite = 'Networks';
Views vs Tables
It is possible to create new table besed on data from a query:
-- We can create permanent table to store this data.
CREATE TABLE dependencies_table
AS
SELECT l1.title AS prerequisite, l2.title AS followup
FROM lecture l1
INNER JOIN requires r ON l1.lecture_id = r.prerequisite
LEFT JOIN lecture l2 ON l2.lecture_id = r.followup;
Note, that this new table and view we have created before are two very different things. You can see it with simple example:
-- At this point both queries should return same result.
SELECT * FROM dependencies_view;
SELECT * FROM dependencies_table;
-- Now let's add new record
INSERT INTO requires (prerequisite, followup)
VALUES (3, 1);
-- We see new record in this view
SELECT * FROM dependencies_view;
-- But nothing changed in this table.
-- It was created based on data that existed at the moment of creation.
SELECT * FROM dependencies_table;
Bulk commands
Load data from files
It is possible to copy table data into a file OR load data into a table from a file:
-- Checking that we have some data in students table.
SELECT * FROM student;
-- Saving this data to file
\copy student TO '/tmp/students';
-- Removing all data from table
DELETE FROM student;
-- Nothing is returned
SELECT * FROM student;
-- Loading data back into the table
\copy student FROM '/tmp/students';
-- We again have our data
SELECT * FROM student;
-- You can also export it as a csv file
\copy student TO '/tmp/students.csv' DELIMITER ',' CSV HEADER;
Loading data from another query
It is also possible to INSERT data into the table in bulk from another query:
-- Using this query we can make sure that everyone attends Databases
INSERT INTO attends (pers_nr, lecture_id)
SELECT pers_nr, lecture_id
FROM student, lecture
WHERE title = 'Databases';