Announcing Crunchy Bridge for Analytics—interact with your data lake using Postgres commands via extension, with a vectorized, parallel query engine. Learn more in our announcement.

Tutorial Instructions

Learn SQL

SQL is a simple language used for retrieving and updating data in a database. A SQL statement is constructed based on what we call “clauses”. We assemble SQL clauses together to return, manipulate, filter, and update data.

The simplest SQL statement is:

SELECT 1;

This tutorial is interactive, run the command in the Postgres shell to the right. You’ll see that result returns a single value of “1”. What if we changed it to:

SELECT 'two';

Now, you’ll see that it returns a string with the value ‘two’. This clause we are running is the SELECT clause. Notice, all SQL statements are finalized by a semicolon: ; .

We have populated this database with a table called “regions” that houses a few regions around the world. To return data from the database, then we will use a FROM clause:

SELECT regions.* FROM regions;

Once we append the FROM regions clause it means we now have access to the values in the regions table. The * can be through of as “return all columns on this table.” If we only want to return a single column, we can be explicit about the values in the SELECT clause:

SELECT regions.region_name FROM regions;

Next, we can use a WHERE clause to restrict the filter to certain regions:

SELECT regions.* FROM regions WHERE regions.region_id = 1;

Experiment changing the values of the WHERE conditional to see how the different values work. In the where clause, you have access to all types of comparison operators, include greater than, less than, equal, contained in a list, etc:

SELECT regions.* FROM regions WHERE regions.region_id < 2; SELECT regions.* FROM regions WHERE regions.region_id > 2; SELECT regions.* FROM regions WHERE regions.region_id >= 2; SELECT regions.* FROM regions WHERE regions.region_name IN ('Asia', 'Americas');

Additionally, you can use AND and OR logic to create complex conditionals (this SQL statement is getting larger, so let’s break it into multiple lines — notice how we break it along the clauses):

SELECT regions.* FROM regions WHERE regions.region_id >= 2 AND regions.region_name IN ('Asia', 'Americas');

SQL gives us the ability to append a sorting clause called ORDER BY:

SELECT regions.* FROM regions ORDER BY regions.region_name;

And, with that sorting clause, we can reverse order (the DESC means descending, if not specified, ordering is assumed to be ASC or ascending):

SELECT regions.* FROM regions ORDER BY regions.region_name DESC;

Because SQL is built on clauses, we can assemble the ORDER BY clause together with the prior WHERE clause:

SELECT regions.* FROM regions WHERE regions.region_id >= 2 AND regions.region_name IN ('Asia', 'Americas') ORDER BY region_name;

See how these SQL clauses are used to assemble an entire statement? SQL is really just assembling clauses together. If you only want to return a specific number of rows, then we can use the LIMIT clause:

SELECT regions.* FROM regions LIMIT 1;

As with all clauses, the LIMIT clause can be integrated with other clauses:

SELECT regions.* FROM regions WHERE regions.region_id >= 2 AND regions.region_name IN ('Asia', 'Americas') ORDER BY region_name LIMIT 1;

Let me just take a second to say: SQL requires your clauses to be ordered properly. So far, we have been using a regions table that has only an region_id and a region_name field. SQL databases are not static. Far from it, we can easily manipulate and update the tables. Guess how we manipulate the tables? We also have SQL clauses for that. Below, we will use the ALTER TABLE and ADD COLUMN clauses:

ALTER TABLE regions ADD COLUMN region_population FLOAT DEFAULT NULL; COMMENT ON COLUMN regions.region_population IS 'population in billions';

Excellent, run that command above, and it will add new column to the regions table. We have also used the COMMENT clause to add metadata about the column to the schema, this is optional but helpful when you have potentially ambiguous data. Now, if you run SELECT * FROM regions; you’ll see this new column, but it will be empty. Next, let’s use an UPDATE statement to update the population for each of the regions:

UPDATE regions SET region_population = 1.002 WHERE regions.region_name = 'Americas';

Above, you’ll see that the UPDATE statements use the SET and WHERE clauses. The SET clause specifies the values to set on the updated table and the WHERE clause restricts which rows the values are set for. UPDATE statements are powerful, and can change the values for many rows all at once. For instance, what if we left off the WHERE clause above and ran the following it would update all rows:

UPDATE regions SET region_population = 1.002;

The response in the terminal is UPDATE 4, which means we updated 4 ROWS! Uh oh … now if you run SELECT * FROM regions , you’ll see the same value is set for all regions (that’s ok, we can fix that):

UPDATE regions SET region_population = 1.002 WHERE regions.region_name = 'Americas'; UPDATE regions SET region_population = 0.493 WHERE regions.region_name = 'Middle East and Africa'; UPDATE regions SET region_population = 4.561 WHERE regions.region_name = 'Asia'; UPDATE regions SET region_population = 0.7464 WHERE regions.region_name = 'Europe';

As you can see from usage of the ALTER TABLE, COMMENT, and UPDATE statements we are assembling clauses together in fairly predictable ways. Now, let’s go back to the SELECT statement and see what we can do with this new data that we’ve added.

Using SQL, we can manipulate values as well. For instance, we can SUM all of the values returned by a column:

SELECT sum(regions.region_population) FROM regions;

This will sum all populations for all regions, but we can use the WHERE conditional to sum the values of Europe, Asia, Middle East, and Africa:

SELECT sum(regions.region_population) FROM regions WHERE regions.region_name IN ('Middle East and Aftrica', 'Asia', 'Europe');

I see that we are missing Australia from the list of regions! Let us add Australia using an INSERT statement:

INSERT INTO regions (region_name, region_population) VALUES ('Australia', 0.02569);

Now, when running SELECT * FROM regions; you’ll see Australia added to the list.

Summary

SQL is just a statement with a series of clauses appended to the statement to modify behavior. The SQL statements that we covered are just a small number of the total number of statements available:

  • SELECT
  • INSERT
  • UPDATE
  • DELETE
  • CREATE TABLE
  • ALTER TABLE

Within SQL statements, we have a list of clauses, and those clauses are sometimes shared across different statements. The clauses we used were:

  • FROM
  • WHERE
  • LIMIT
  • SET
  • GROUP BY
  • ADD COLUMN

Loading terminal...

Loading terminal...