Rolling the Dice with the PostgreSQL Random Functions
Generating random numbers is a surprisingly common task in programs, whether it's to create test data or to provide a user with a random entry from a list of items.
PostgreSQL comes with just a few simple foundational functions that can be used to fulfill most needs for randomness.
Almost all your random-ness needs will be met with the random()
function.
Uniformity
The random()
function returns a double precision float in a continuous uniform distribution between 0.0 and 1.0.
What does that mean? It means that you could get any value between 0.0 and 1.0, with equal probability, for each call of random()
.
Here's five uniform random numbers between 0.0 and 1.0.
SELECT random() FROM generate_series(1, 5)
0.3978842227698167
0.7438732417540841
0.3875091442400458
0.4108009373061563
0.5524543763568912
Yep, those look pretty random! But, maybe not so useful?
Random Numbers
Most times when people are trying to generate random numbers, they are looking for random integers in a range, not random floats between 0.0 and 1.0.
Say you wanted random integers between 1 and 10, inclusive. How do you get that, starting from random()
?
Start by scaling an ordinary random()
number up be a factor of 10! Now you have a continuous distribution between 0 and 10.
SELECT 10 * random() FROM generate_series(1, 5)
3.978842227698167
7.438732417540841
3.875091442400458
4.108009373061563
5.5245437635689125
Then, if you push every one of those numbers down to the nearest integer using floor()
you'll end up with a random integer between 0 and 9.
SELECT floor(10 * random()) FROM generate_series(1, 5)
4
8
4
5
6
If you wanted a random integer between 1 and 10, you just need to add 1 to the zero-base number.
SELECT floor(10 * random()) + 1 FROM generate_series(1, 5)
3
7
3
4
5
Random Rows and Values
Sometimes the things you are trying to do randomly aren't numbers. How do you get a random entry out of a string? Or a random row from a table?
We already saw how to get one-based integers from random()
and we can apply that technique to the problem of pulling an entry from an array.
WITH f AS (
SELECT ARRAY[
'apple',
'banana',
'cherry',
'pear',
'peach'] AS fruits
)
SELECT fruits[ceil(array_length(fruits,1) * random())] AS snack
FROM f;
snack
-------
peach
Getting a random row involves some tradeoffs and thinking. For a random value from a small table, the naive way to get a single random value is this.
SELECT *
FROM fruits
ORDER BY random()
LIMIT 1
As you can imagine, this gets quite expensive if the fruits
table gets too large, since it sorts the whole table every time.
If you only need a single random row, one way to achieve that is to add a random column to your table and index it.
CREATE TABLE fruits (
id SERIAL PRIMARY KEY,
fruit TEXT NOT NULL,
random FLOAT8 DEFAULT random()
);
INSERT INTO fruits (fruit)
VALUES ('apple'),('banana'),('cherry'),('pear'),('peach');
CREATE INDEX fruits_random_x ON fruits (random);
Then when it's time to search, use the random function to generate a starting search location and find the next highest value.
SELECT *
FROM fruits
WHERE random > random()
ORDER BY random ASC
LIMIT 1;
id | fruit | random
----+--------+--------------------
8 | banana | 0.1997961574379754
Be careful using this trick for more than one row though: since the values in the random column are fixed, the sequences of rows returned will be deterministic, even if the start row is random.
If you want to pull large portions of a table into a query (for random sampling, for example) look at the TABLESAMPLE
clause of the SELECT
command.
Random Groups
Suppose I wanted the entire contents of the fruits collection, but returned in two random groups? This is actually much like getting a single random value: order the whole set randomly, and then use that ordering to determine grouping.
WITH random_fruits AS (
SELECT id, fruit
FROM fruits
ORDER BY random()
)
SELECT row_number() over () % 2 AS group,
id, fruit
FROM random_fruits
ORDER BY 1;
group | id | fruit
-------+----+--------
0 | 11 | peach
0 | 8 | banana
1 | 10 | pear
1 | 7 | apple
1 | 9 | cherry
The '2' in the example above is the number of groups desired.
random_normal
So far we have just been looking at ways to permute the uniform distribution offered by the random()
function. But there is in fact an infinite number of other probability distributions that random numbers could be a part of.
Of that infinite collection, by far the most frequently used in practice is the "normal distribution" also known as the "Gaussian distribution" or "bell curve".
Rather than having a hard cut-off point, the normal distribution has a frequent center and then ever lower probability of values out to infinity in both directions.
The position of the center of the distribution is the "mean" and the rate of probability decay is controlled by the "standard deviation".
To generate normally distributed data in PostgreSQL, use the random_normal(mean, stddev)
function that was introduced in version 16.
SELECT random_normal(0, 1)
FROM generate_series(1,10)
ORDER BY 1
-0.8147201382612904
-0.5751449000210354
-0.4643454485382744
-0.0630592935151314
0.26438942114339203
0.39298889191244274
0.4946046063256206
0.8560911955145666
1.3534309793797454
1.664493506727331
It's kind of hard to appreciate that the data have a central tendency without generating a lot more of them and counting how many fall within each bin.
SELECT random_normal()::integer,
Count(*)
FROM generate_series(1,1000)
GROUP BY 1
ORDER BY 1
The cast to integer
rounds the values towards the nearest integer, so you can see how the data are mostly between the first two standard deviations of the mean.
random_normal | count
---------------+-------
-3 | 5
-2 | 65
-1 | 233
0 | 378
1 | 246
2 | 67
3 | 5
4 | 1
Seeds and Pseudo-randomness
If you looked very closely at the examples in the first section you'll have noticed that they all started from the same, allegedly random values.
If random()
truly is random, how did I get the same starting values four times in a row?
The answer, shockingly, is that random()
is actually "pseudo-random".
A pseudorandom sequence of numbers is one that appears to be statistically random, despite having been produced by a completely deterministic and repeatable process.
With a pseudo-random number generator and a known starting point, I will always get the same sequence of numbers, at least on the same computer.
The reason most computer programs use pseudo-random number generators is that generating truly random numbers is actually quite an expensive operation (relatively speaking).
So programs instead generate one truly random number, and use that as a "seed" for a generator.
PostgreSQL uses the Blackman/Vigna "xoroshiro128 1.0" pseudo-random number generator.
By default, on start-up PostgreSQL sets up a seed value by calling an external random number generator, using an appropriate method for the platform:
- Using OpenSSL
RAND_bytes()
if available, or - using Windows
CryptGenRandom()
on that platform, or - using the operating system
/dev/urandom
if necessary.
So if you are interested in a random number, just calling random()
will get you one every time.
But if you want to put your finger on the scales, you can use the setseed()
function to cause your random()
and random_normal()
functions to generate a deterministic series of random numbers, starting from a seed value you specify.