- "How many houses are within the evacuation radius?"
- "Which responder is closest to the call?"
- "How many more miles until the school bus needs routine maintenance?"

PostGIS and any other spatial database let you answer these kinds of questions in SQL, using ST_Distance(geom1, geom2) to return a distance, or ST_DWithin(geom1, geom2, radius) to return a true/false result within a tolerance.

```
SELECT ST_Distance(
'LINESTRING (150 300, 226 274, 320 280, 370 320, 390 370)'::geometry,
'LINESTRING (140 180, 250 230, 350 200, 390 240, 450 200)'::geometry
);
```

It all looks very simple, but under the covers there is a lot of machinery around getting a result fast for different kinds of inputs.

Distance should be easy! After all, we learn how to calculate distance in middle school! The Pythagorean Theorem tells us that the square of the hypotenuse of a right triangle is the sum of the squares of the two other sides.

So, problem solved, right?

Not so fast. Pythagorus gives us the distance between two points, but objects in spatial databases like PostGIS can be much more complex.

How would I calculate the distance between two complex polygons?

The straight-forward solution is to just find the distance between every possible combination of edges in the two polygons, and return the minimum of that set.

This is a "quadratic" algorithm, what computer scientists call `O(n^2)`

, because
the amount of work it generates is proportional to the square of the number of
inputs. As the inputs get big, the amount of work gets **very very very big**.

Fortunately, there are better ways.

The distance implementation in PostGIS has two major code paths:

- For disjoint (non-overlapping) inputs, an optimized calculation; and,
- For overlapping inputs, the brute force calculation.

Disjoint inputs are handled with a clever simplification of the problem space. Because the inputs are disjoint, it is possible to construct a line between the centers of the two inputs.

If every edge in each object is projected down onto the line, it becomes possible to perform a sort of those edges, such that edges that are near on the line are also near in the sorted lists, and near in space.

Starting from the mid-point of each object it is relatively inexpensive to
quickly prune away large numbers of edges that are definitely **not** the
nearest edges, leaving a much smaller number of potential targets that need to
have their distance calculated.

The cost of creating the projected segments is just `O(n)`

, but the cost of the
sort step is `O(n*log(n))`

so the overall cost of the algorithm is
`O(n*log(n))`

.

This is all well and good, but what if the inputs **do** overlap? Then the
algorithm falls back to brute-force and `O(n^2)`

. Is there any way to avoid
that?

The project-and-prune approach is very clever, but it is possible to generate a spatially searchable representation of the edges even faster, by using the fact that edges in a LineString or LinearRing are highly spatial autocorrelated:

- The end point of one edge is
**always**the start point of the next. - The edges
**mostly**don't cross each other.

Basically, the edges are already spatially pre-sorted. That means it is possible to build a decent tree structure from them incurring any non-linear computational cost.

Start with the edges in sorted order. The bounds of the edges form the leaf
nodes of a spatial tree. Merge neighboring leaf nodes, now you have the first
level of interior nodes. Continue until you have only one node left, that is
your root node. The cost is `O(n) + O(0.5n) + O(0.25n) ...`

which is to say in
aggregate, `O(n)`

.

Ordinarily, building a spatial tree would be expected to cost about
`O(n*log(n))`

, so this is a nice win.

The `CIRC_NODE`

tree used to
accelerate distance calculation
for the `geography`

type is built using this process.

There is no guarantee that a tree-indexed approach will crack the overlapping polygon problem.

Disjoint polygons are very amenable to distance searching trees, because it is easy to discard whole branches of the tree that are definitionally too far away to contain candidate edges.

As inputs begin to overlap, it becomes harder to discard large portions of the trees, and as a result a lot of computation is spent traversing the tree, even if a moderate proportion of candidates can be discarded from the lower branches of the tree.

The distance calculation in PostGIS has not been touched in many years, for good reason: it's really important, so any re-write has to be definitely an improvement on the existing code, over all known (and unknown) use cases.

However, there is some already built and tested code, in the code base, which has never been turned on, the RECT_TREE.

Like the `CIRC_NODE`

tree in geography, this implementation is based on building
a tree from spatially coherent inputs. Unlike the `CIRC_NODE`

tree, it has not
been proven to be faster than the existing implementation in all cases.

A next development step will be to revive this implementation, evaluate it for implementation efficiency, and test effectiveness:

- Can it exceed the current sort-and-prune strategy for disjoint polygons?
- Can it exceed brute-force for overlapping polygons?

We previously looked at the popular DBSCAN spatial clustering algorithm, that builds clusters off of spatial density.

This post explores the features of the PostGIS ST_ClusterKMeans function. K-means clustering is having a moment, as a popular way of grouping very high-dimensional LLM embeddings, but it is also useful in lower dimensions for spatial clustering.

ST_ClusterKMeans will cluster 2-dimensional and 3-dimensional data, and will also perform weighted clustering on points when weights are provided in the "measure" dimension of the points.

To try out K-Means clustering we need some points to cluster, in this case the 1:10M populated places from Natural Earth.

Download the GIS files and load up to your database, in this example using ogr2ogr.

```
ogr2ogr \
-f PostgreSQL \
-nln popplaces \
-lco GEOMETRY_NAME=geom \
PG:'dbname=postgres' \
ne_10m_populated_places_simple.shp
```

A simple clustering in 2D space looks like this, using 10 as the number of clusters:

```
CREATE TABLE popplaces_geographic AS
SELECT geom, pop_max, name,
ST_ClusterKMeans(geom, 10) OVER () AS cluster
FROM popplaces;
```

Note that pieces of Russia are clustered with Alaska, and Oceania is split up. This is because we are treating the longitude/latitude coordinates of the points as if they were on a plane, so Alaska is very far away from Siberia.

For data confined to a small area, effects like the split at the dateline do not matter, but for our global example, it does. Fortunately there is a way to work around it.

We can convert the longitude/latitude coordinates of the original data to a geocentric coordinate system using ST_Transform. A "geocentric" system is one in which the origin is the center of the Earth, and positions are defined by their X, Y and Z distances from that center.

In a geocentric system, positions on either side of the dateline are still very close together in space, so it's great for clustering global data without worrying about the effects of the poles or date line. For this example we will use EPSG:4978 as our geocentric system.

Here are the coordinates of New York, converted to geocentric.

```
SELECT ST_AsText(ST_Transform(ST_PointZ(74.0060, 40.7128, 0, 4326), 4978), 1);
```

```
POINT Z (1333998.5 4654044.8 4138300.2)
```

And here is the cluster operation performed in geocentric space.

```
CREATE TABLE popplaces_geocentric AS
SELECT geom, pop_max, name,
ST_ClusterKMeans(
ST_Transform(
ST_Force3D(geom),
4978),
10) OVER () AS cluster
FROM popplaces;
```

The results look very similar to the planar clustering, but you can see the "whole world" effect in a few places, like how Australia and all the islands of Oceania are now in one cluster, and how the dividing point between the Siberia and Alaska clusters has moved west across the date line.

It's worth noting that this clustering has been performed in three dimensions (since geocentric coordinates require an X, Y and Z), even though we are displaying the results in two dimensions.

In addition to naïve k-means, ST_ClusterKMeans can carry out weighted k-means clustering, to push the cluster locations around using extra information in the "M" dimension (the fourth coordinate) of the input points.

Since we have a "populated places" data set, it makes sense to use population as a weight for this example. The weighted algorithm requires strictly positive weights, so we filter out the handful of records that are non-positive.

```
CREATE TABLE popplaces_geocentric_weighted AS
SELECT geom, pop_max, name,
ST_ClusterKMeans(
ST_Force4D(
ST_Transform(ST_Force3D(geom), 4978),
mvalue => pop_max
),
10) OVER () AS cluster
FROM popplaces
WHERE pop_max > 0;
```

Again, the differences are subtle, but note how India is now a single cluster, how the Brazil cluster is now biased towards the populous eastern coast, and how North America is now split into east and west.

]]>The ST_ClusterDBSCAN function
in PostGIS is a quick and easy way to extract clusters from point data. DBSCAN
specifically works with density and is well suited for population or density
type spatial data. To demonstrate `ST_ClusterDBSCAN`

I'm going to work with the
geographic names data, specifically the schools, and show how we can quickly
create a U.S. population density map.

Let's explore clustering using geographic names data.

Create a table to hold the data. Note that the table is generating the points automatically from the longitude/latitude (EPSG:4326) and transforming into a planar projection for the USA (EPSG:5070).

```
CREATE TABLE geonames (
geonameid integer,
name text,
asciiname text,
alternatenames text,
latitude float8,
longitude float8,
fclass char,
fcode text,
country text,
cc2 text,
admin1 text,
admin2 text,
admin3 text,
admin4 text,
population bigint,
elevation integer,
dem text,
timezone text,
modification date,
geom geometry(point, 5070)
GENERATED ALWAYS AS
(ST_Transform(ST_Point(longitude, latitude, 4326),5070)) STORED
);
```

Now load the table. Note the super fun use of `PROGRAM`

to pull data directly
from the web and feed a `COPY`

.

```
COPY geonames
FROM PROGRAM '(curl http://download.geonames.org/export/dump/US.zip > /tmp/US.zip) && unzip -p /tmp/US.zip US.txt'
WITH (FORMAT CSV, DELIMITER E'\t', HEADER false);
```

(This trick only works using the `postgres`

superuser, since it involves calling
a program and writing to system disk. If you do not have superuser access,
download and unzip the `US.TXT`

file by hand and
load it
using `COPY`

from the file.)

Finally, add a spatial index to the `geom`

column.

```
CREATE INDEX geonames_geom_x
ON geonames
USING GIST (geom);
```

There are 434 distinct feature codes in the `geonames`

table. We will restrict
our analysis to just the 205,848 schools, with an `fcode`

of `SCH`

.

```
SELECT Count(DISTINCT fcode) FROM geonames;
SELECT Count(fcode) FROM geonames WHERE fcode = 'SCH';
```

Schools are an interesting feature to analyze because there's a nice strong correlation between the number of schools and the population. There's a lot of schools! But they are not uniformly distributed.

If we zoom into the midwest, the concentration of schools in populated places
pops out. **We can use PostGIS to turn this distribution difference into a data
set of populated places!**

The DBSCAN clustering algorithm is a "density based spatial clustering of applications with noise". The PostGIS ST_ClusterDBSCAN implementation is a window function that takes three parameters:

- The geometries to be analyzed for clusters.
- A 'eps' distance tolerance. Geometries must be within this distance to be added to a cluster.
- A 'minpoints' count. If a point is within the 'eps' distance of 'minpoints' cluster members, it is a "core member" of the cluster.

An input geometry is added to a cluster if it is either:

- A "core" geometry, that is within eps distance of at least minpoints input geometries (including itself); or
- A "border" geometry, that is within eps distance of a core geometry.

How does this play out in practice?

If we zoom further into Chicago, around the suburban/exurban transition, the schools are about 1000 meters apart, sometimes more sometimes less, transitioning out to 2000 meters and more in the exurbs.

For our clusters, we will use:

- A
`eps`

distance of 2000m - A
`minpoints`

of 5 - A partition on the state code (
`admin1`

) to cut down on the number of cluster numbers.

```
CREATE TABLE geonames_sch AS
SELECT ST_ClusterDBScan(geom, 2000, 5)
OVER (PARTITION BY admin1) AS cluster, *
FROM geonames
WHERE fcode = 'SCH';
```

The result looks like this, with each cluster given a distinct color, and un-clustered schools rendered transparent.

The smaller clusters look a little arbitrary, but if we zoom in, we can see that even small population centers have been surfaced with this analytical technique.

Here is Kanakee, Illinois, neatly identified as a populated place by its cluster of schools.

Now that we have clusters, getting a populated place point is as simple as using the ST_Centroid function.

```
CREATE TABLE geonames_popplaces AS
SELECT ST_Centroid(ST_Collect(geom))::geometry(Point, 5070) AS geom,
Count(*) AS school_count,
cluster, admin1
FROM geonames_sch
GROUP BY cluster, admin1
```

We have completed the analysis, converting the density difference in school locations into a set of derived populated place points.

Now for the whole population cluster map!

- Create a table
`ST_ClusterDBScan`

- Set an
`eps`

for distance tolerance - Set a
`minpoints`

to reduce density - Partition on a different field to cut down on the number of cluster numbers.

- Set an
- Create a final table using the
`ST_Centroid`

PostgreSQL comes with just a few simple foundational functions that can be used to fulfill most needs for randomness.

Almost all your random-ness needs will be met with the `random()`

function.

The `random()`

function returns a double precision float in a
continuous uniform distribution
between 0.0 and 1.0.

What does that mean? It means that you could get any value between 0.0 and 1.0,
with equal probability, for each call of `random()`

.

Here's five uniform random numbers between 0.0 and 1.0.

```
SELECT random() FROM generate_series(1, 5)
```

```
0.3978842227698167
0.7438732417540841
0.3875091442400458
0.4108009373061563
0.5524543763568912
```

Yep, those look pretty random! But, maybe not so useful?

Most times when people are trying to generate random numbers, they are looking
for random **integers** in a range, not random floats between 0.0 and 1.0.

Say you wanted random integers between 1 and 10, inclusive. How do you get that,
starting from `random()`

?

Start by scaling an ordinary `random()`

number up be a factor of 10! Now you
have a continuous distribution between 0 and 10.

```
SELECT 10 * random() FROM generate_series(1, 5)
```

```
3.978842227698167
7.438732417540841
3.875091442400458
4.108009373061563
5.5245437635689125
```

Then, if you push every one of those numbers down to the nearest integer using
`floor()`

you'll end up with a random integer between 0 and 9.

```
SELECT floor(10 * random()) FROM generate_series(1, 5)
```

```
4
8
4
5
6
```

If you wanted a random integer between 1 and 10, you just need to add 1 to the zero-base number.

```
SELECT floor(10 * random()) + 1 FROM generate_series(1, 5)
```

```
3
7
3
4
5
```

Sometimes the things you are trying to do randomly aren't numbers. How do you get a random entry out of a string? Or a random row from a table?

We already saw how to get one-based integers from `random()`

and we can apply
that technique to the problem of pulling an entry from an array.

```
WITH f AS (
SELECT ARRAY[
'apple',
'banana',
'cherry',
'pear',
'peach'] AS fruits
)
SELECT fruits[ceil(array_length(fruits,1) * random())] AS snack
FROM f;
```

```
snack
-------
peach
```

Getting a random row involves some tradeoffs and thinking. For a random value from a small table, the naive way to get a single random value is this.

```
SELECT *
FROM fruits
ORDER BY random()
LIMIT 1
```

As you can imagine, this gets quite expensive if the `fruits`

table gets too
large, since it sorts the whole table every time.

If you only need a single random row, one way to achieve that is to add a random column to your table and index it.

```
CREATE TABLE fruits (
id SERIAL PRIMARY KEY,
fruit TEXT NOT NULL,
random FLOAT8 DEFAULT random()
);
INSERT INTO fruits (fruit)
VALUES ('apple'),('banana'),('cherry'),('pear'),('peach');
CREATE INDEX fruits_random_x ON fruits (random);
```

Then when it's time to search, use the random function to generate a starting search location and find the next highest value.

```
SELECT *
FROM fruits
WHERE random > random()
ORDER BY random ASC
LIMIT 1;
```

```
id | fruit | random
----+--------+--------------------
8 | banana | 0.1997961574379754
```

Be careful using this trick for more than one row though: since the values in the random column are fixed, the sequences of rows returned will be deterministic, even if the start row is random.

If you want to pull large portions of a table into a query (for random sampling,
for example) look at the `TABLESAMPLE`

clause of the
`SELECT`

command.

Suppose I wanted the entire contents of the fruits collection, but returned in two random groups? This is actually much like getting a single random value: order the whole set randomly, and then use that ordering to determine grouping.

```
WITH random_fruits AS (
SELECT id, fruit
FROM fruits
ORDER BY random()
)
SELECT row_number() over () % 2 AS group,
id, fruit
FROM random_fruits
ORDER BY 1;
```

```
group | id | fruit
-------+----+--------
0 | 11 | peach
0 | 8 | banana
1 | 10 | pear
1 | 7 | apple
1 | 9 | cherry
```

The '2' in the example above is the number of groups desired.

`random_normal`

So far we have just been looking at ways to permute the uniform distribution
offered by the `random()`

function. But there is in fact an infinite number of
other probability distributions that random numbers could be a part of.

Of that infinite collection, by far the most frequently used in practice is the "normal distribution" also known as the "Gaussian distribution" or "bell curve".

Rather than having a hard cut-off point, the normal distribution has a frequent center and then ever lower probability of values out to infinity in both directions.

The position of the center of the distribution is the "mean" and the rate of probability decay is controlled by the "standard deviation".

To generate normally distributed data in PostgreSQL, use the
`random_normal(mean, stddev)`

function that was introduced in
version 16.

```
SELECT random_normal(0, 1)
FROM generate_series(1,10)
ORDER BY 1
```

```
-0.8147201382612904
-0.5751449000210354
-0.4643454485382744
-0.0630592935151314
0.26438942114339203
0.39298889191244274
0.4946046063256206
0.8560911955145666
1.3534309793797454
1.664493506727331
```

It's kind of hard to appreciate that the data have a central tendency without generating a lot more of them and counting how many fall within each bin.

```
SELECT random_normal()::integer,
Count(*)
FROM generate_series(1,1000)
GROUP BY 1
ORDER BY 1
```

The cast to `integer`

rounds the values towards the nearest integer, so you can
see how the data are mostly between the first two standard deviations of the
mean.

```
random_normal | count
---------------+-------
-3 | 5
-2 | 65
-1 | 233
0 | 378
1 | 246
2 | 67
3 | 5
4 | 1
```

If you looked **very** closely at the examples in the first section you'll have
noticed that they all started from the same, allegedly random values.

If `random()`

truly is random, how did I get the same starting values four times
in a row?

The answer, shockingly, is that `random()`

is actually
"pseudo-random".

A pseudorandom sequence of numbers is one that appears to be statistically random, despite having been produced by a completely deterministic and repeatable process.

With a pseudo-random number generator and a known starting point, I will always get the same sequence of numbers, at least on the same computer.

The reason most computer programs use pseudo-random number generators is that generating truly random numbers is actually quite an expensive operation (relatively speaking).

So programs instead generate one truly random number, and use that as a "seed" for a generator.

PostgreSQL uses the Blackman/Vigna "xoroshiro128 1.0" pseudo-random number generator.

By default, on start-up PostgreSQL sets up a seed value by calling an external random number generator, using an appropriate method for the platform:

- Using OpenSSL
`RAND_bytes()`

if available, or - using Windows
`CryptGenRandom()`

on that platform, or - using the operating system
`/dev/urandom`

if necessary.

So if you are interested in a random number, just calling `random()`

will get
you one every time.

But if you want to put your finger on the scales, you can use the `setseed()`

function to cause your `random()`

and `random_normal()`

functions to generate a
deterministic series of random numbers, starting from a seed value you specify.

Random data is important for validating processing chains, analyses and reports. The best way to test a process is to feed it inputs!

Random points is pretty easy -- define an area of interest and then use the
PostgreSQL `random()`

function to create
the X and Y values in that area.

```
CREATE TABLE random_points AS
WITH bounds AS (
SELECT 0 AS origin_x,
0 AS origin_y,
80 AS width,
80 AS height
)
SELECT ST_Point(width * (random() - 0.5) + origin_x,
height * (random() - 0.5) + origin_y,
4326)::Geometry(Point, 4326) AS geom,
id
FROM bounds,
generate_series(0, 100) AS id
```

Filling a target shape with random points is a common use case, and there's a
special function just for that,
`ST_GeneratePoints()`

. Here
we generate points inside a circle created with
`ST_Buffer()`

.

```
CREATE TABLE random_points AS
SELECT ST_GeneratePoints(
ST_Buffer(
ST_Point(0, 0, 4326),
50),
100) AS geom
```

If you have PostgreSQL 16, you can use the brand new
`random_normal()`

function to
generate coordinates with a central tendency.

```
CREATE TABLE random_normal_points AS
WITH bounds AS (
SELECT 0 AS origin_x,
0 AS origin_y,
80 AS width,
80 AS height
)
SELECT ST_Point(random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
4326)::Geometry(Point, 4326) AS geom,
id
FROM bounds,
generate_series(0, 100) AS id
```

`random_normal()`

.```
CREATE OR REPLACE FUNCTION random_normal(
mean double precision DEFAULT 0.0,
stddev double precision DEFAULT 1.0)
RETURNS double precision AS
$$
DECLARE
u1 double precision;
u2 double precision;
z0 double precision;
z1 double precision;
BEGIN
u1 := random();
u2 := random();
z0 := sqrt(-2.0 * ln(u1)) * cos(2.0 * pi() * u2);
z1 := sqrt(-2.0 * ln(u1)) * sin(2.0 * pi() * u2);
RETURN mean + (stddev * z0);
END;
$$ LANGUAGE plpgsql;
```

Linestrings are a little harder, because they involve more points, and aesthetically we like to avoid self-crossings of lines.

Two-point linestrings are pretty easy to generate with
`ST_MakeLine()`

-- just generate
twice as many random points, and use them as the start and end points of the
linestrings.

```
CREATE TABLE random_2point_lines AS
WITH bounds AS (
SELECT 0 AS origin_x, 80 AS width,
0 AS origin_y, 80 AS height
)
SELECT ST_MakeLine(
ST_Point(random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
4326),
ST_Point(random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
4326))::Geometry(LineString, 4326) AS geom,
id
FROM bounds,
generate_series(0, 100) AS id
```

Multi-point random linestrings are harder, at least while avoiding self-intersections, and there are a lot of potential approaches. While a recursive CTE could probably do it, an imperative approach using PL/PgSQL is more readable.

The `generate_random_linestring()`

function starts with an empty linestring, and
then adds on new segments one at a time, changing the direction of the line with
each new segment.

`generate_random_linestring()`

definition.
```
CREATE OR REPLACE FUNCTION generate_random_linestring(
start_point geometry(Point))
RETURNS geometry(LineString, 4326) AS
$$
DECLARE
num_segments integer := 10; -- Number of segments in the linestring
deviation_max float := radians(45); -- Maximum deviation
random_point geometry(Point);
deviation float;
direction float := 2 * pi() * random();
segment_length float := 5; -- Length of each segment (adjust as needed)
i integer;
result geometry(LineString) := 'SRID=4326;LINESTRING EMPTY';
BEGIN
result := ST_AddPoint(result, start_point);
FOR i IN 1..num_segments LOOP
-- Generate a random angle within the specified deviation
deviation := 2 * deviation_max * random() - deviation_max;
direction := direction + deviation;
-- Calculate the coordinates of the next point
random_point := ST_Point(
ST_X(start_point) + cos(direction) * segment_length,
ST_Y(start_point) + sin(direction) * segment_length,
ST_SRID(start_point)
);
-- Add the point to the linestring
result := ST_AddPoint(result, random_point);
-- Update the start point for the next segment
start_point := random_point;
END LOOP;
RETURN result;
END;
$$
LANGUAGE plpgsql;
```

We can use the `generate_random_linestring()`

function now to turn random start
points (created in the usual way) into fully random squiggly lines!

```
CREATE TABLE random_lines AS
WITH bounds AS (
SELECT 0 AS origin_x, 80 AS width,
0 AS origin_y, 80 AS height
)
SELECT id,
generate_random_linestring(
ST_Point(random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
4326))::Geometry(LineString, 4326) AS geom
FROM bounds,
generate_series(1, 100) AS id;
```

At the simplest level, a set of random boxes is a set of random polygons, but
that's pretty boring, and easy to generate using
`ST_MakeEnvelope()`

.

```
CREATE TABLE random_boxes AS
WITH bounds AS (
SELECT 0 AS origin_x, 80 AS width,
0 AS origin_y, 80 AS height
)
SELECT ST_MakeEnvelope(
random_normal(origin_x, width/4),
random_normal(origin_y, height/4),
random_normal(origin_x, width/4),
random_normal(origin_y, height/4)
)::Geometry(Polygon, 4326) AS geom,
id
FROM bounds,
generate_series(0, 20) AS id
```

But more interesting polygons have curvy and convex shapes, how can we generate those?

One way is to extract a polygon from a set of random points, using
`ST_ConcaveHull()`

, and then
applying an "erode and dilate" effect to make the curves more pleasantly round.

We start with a random center point for each polygon, and create a circle with
`ST_Buffer()`

.

Then use
`ST_GeneratePoints()`

to fill
the circle with some random points -- not too many, so we get a nice jagged
result.

Then use `ST_ConcaveHull()`

to trace a "boundary" around those points.

Then apply a negative buffer, to erode the shape.

And finally a positive buffer to dilate it back out again.

Generating multiple hulls involves stringing together all the above operations with CTEs or subqueries.

```
CREATE TABLE random_hulls AS
WITH bounds AS (
SELECT 0 AS origin_x,
0 AS origin_y,
80 AS width,
80 AS height
),
polypts AS (
SELECT ST_Point(random_normal(origin_x, width/2),
random_normal(origin_y, width/2),
4326)::Geometry(Point, 4326) AS geom,
polyid
FROM bounds,
generate_series(1,10) AS polyid
),
pts AS (
SELECT ST_GeneratePoints(ST_Buffer(geom, width/5), 20) AS geom,
polyid
FROM bounds,
polypts
)
SELECT ST_Multi(ST_Buffer(
ST_Buffer(
ST_ConcaveHull(geom, 0.3),
-2.0),
3.0))::Geometry(MultiPolygon, 4326) AS geom,
polyid
FROM pts;
```

Another approach is to again start with random points, but use the Voronoi diagram as the basis of the polygon.

Start with a center point and buffer circle.

Generate random points in the circle.

Use the
`ST_VoronoiPolygons()`

function to generate polygons that subdivide the space using the random points
as seeds.

Filter just the polygons that are fully contained in the originating circle.

And then use `ST_Union()`

to merge
those polygons into a single output shape.

Generating multiple hulls again involves stringing together the above operations with CTEs or subqueries.

```
CREATE TABLE random_delaunay_hulls AS
WITH bounds AS (
SELECT 0 AS origin_x,
0 AS origin_y,
80 AS width,
80 AS height
),
polypts AS (
SELECT ST_Point(random_normal(origin_x, width/2),
random_normal(origin_y, width/2),
4326)::Geometry(Point, 4326) AS geom,
polyid
FROM bounds,
generate_series(1,20) AS polyid
),
voronois AS (
SELECT ST_VoronoiPolygons(
ST_GeneratePoints(ST_Buffer(geom, width/5), 10)
) AS geom,
ST_Buffer(geom, width/5) AS geom_clip,
polyid
FROM bounds,
polypts
),
cells AS (
SELECT (ST_Dump(geom)).geom, polyid, geom_clip
FROM voronois
)
SELECT ST_Union(geom)::Geometry(Polygon, 4326) AS geom, polyid
FROM cells
WHERE ST_Contains(geom_clip, geom)
GROUP BY polyid;
```

]]>