Loading Data into PostGIS: An Overview
There are a lot of ways to load data into a PostgreSQL/PostGIS database and it's no different with spatial data. If you're new to PostGIS, you've come to the right place. In this blog post, I'll outline a few free, open source tools you can use for your spatial data import needs.
You can use a desktop GUI application like QGIS, and/or command-line utilities. If you want more flexibility, it's great to have both types in your toolkit.
I'll talk about importing to PostGIS within the context of vector data, since it's a much more common use case. It's possible to import raster data into PostGIS as well. My colleague Paul Ramsey demonstrated how to store and work with raster data in PostGIS in another Crunchy blog post.
Let's start with command-line tools: shp2pgsql and ogr2ogr are probably two of the most widely-used utilities. What's great about the command line is that some bash/shell syntax can help kick your efficiency up a notch or more. You can load datasets with a simple script, or streamline data wrangling or whatever pre-processing of the files you may need to do before import.
I'm using Postgres 13 and PostGIS 3.0 on Windows Subsystem for Linux. The utilities I mention do work on the Windows command line as well, just that the syntax may be a little different.
Shapefiles are a common data format in the GIS world and shp2pgsql is a standard tool for loading shapefiles into PostGIS. It comes with a PostGIS install, and it takes the shapefile data and forms SQL statements that can run against the database.
Use shp2pgsql by either:
- Saving the output into a SQL script
shp2pgsql -s 4326 -I ./Parking_Garages_and_Lots.shp > parking_garages_lots.sql
At this point you can do whatever you like with the SQL file that you have. You can check the script and make changes if necessary. Save the file for use later or hand it off for someone else to work with, etc. To load the output into PostGIS, you can run the script with a Postgres client. Here's an example of how to do so withpsql:
psql -U kat -h localhost -d tampa -f parking_garages_lots.sql
- Or, you could pipe the output directly into psql on the command line
shp2pgsql -s 4326 -I ./Parking_Garages_and_Lots.shp | psql -U kat -h localhost tampa
In both cases, shp2pgsql will:
- Create a new table called
- Set the spatial reference identifier (SRID) to 4326 for the spatial features,
- Create a GiST index on the geometry/geography column.
There are more parameters available if you need shp2pgsql to work in a specific way (such as append to an existing table instead of creating a new table, or use the Postgres dump format, or use a geography type instead of geometry, etc).
ogr2ogr converts feature data between vector data source formats that can be found in the Geospatial Data Abstraction Library (GDAL). That's a lot of formats, so it's a very powerful tool and you can use it in many different scenarios.
ogr2ogr has to be downloaded separately from PostGIS. With that said, it's included with a QGIS install.
Here's an example of a GeoJSON import to PostGIS (watch out for the "" character that delineates multiline commands in Linux):
ogr2ogr \ -f PostgreSQL PG:"host=localhost user=kat password=my_secure_password \ dbname=tampa" ./Park_Polygons.geojson \ -nln park_polygons -lco GEOMETRY_NAME=geom
The above creates a new park_polygons table that will have a geometry column named geom.
I also want to note that ogr2ogr supports shapefiles as well. Is choosing ogr2ogr versus shp2pgsql a matter of opinion? It's probably not as simple as that, but so far I've found shp2pgsql to be a bit more straightforward to use in my own examples. (Except for when I'm not sure which SRID to use: something plenty of GIS newbies encounter, and sometimes more seasoned pros too).
If you're familiar with the psql \copy command, you can use that to load spatial data from a file. For instance, you might have geographic coordinates, in latitude and longitude, included with a dataset in a .txt file (the GeoNames database has files you can download as examples, such as the cities datasets).
You can always run \copy and load latitude and longitude into its own respective columns, and then set the value of a separate geometry type column using PostGIS functions:
UPDATE table SET geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);
You could also define the geometry column to be a calculated column (available in Postgres 12+), so when you add new longitude and latitude values the Point value is automatically set:
CREATE TABLE table( ... geom geometry(Point,4326) GENERATED ALWAYS AS (ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)) STORED );
The SQL COPY command works a bit differently from \copy in that COPY is generally thought of as "server side" while \copy is "client side." But the syntax is almost the same, and on the whole \copy (COPY) should give you faster performance compared to INSERT.
And, at the very least, you can also run simple INSERTs to populate spatial data in PostGIS as well. The values have to be properly formatted for SQL. Check out the Loading Data section of the official PostGIS docs for an example.
What about GUI options?
QGIS is a popular desktop GIS application, and it lets you integrate with databases such as Postgres/PostGIS. I'm using the latest release at the time of writing (version 3.16).
Connect to PostGIS in QGIS
To set up my PostGIS connection, I start from the Browser panel:
Under Connection Information, you'd fill out the Host, Port, and Database fields (or, alternatively, Service). You might need to enable SSL mode depending on your database server requirements:
In the above screenshot I've used basic authentication, but the QGIS docs do recommend more secure methods-an authentication configuration, or a service connection file-if you want to store your credentials.
Once you're connected to PostGIS, you can load spatial data a couple of ways:
A) DB Manager
DB Manager is a core plugin for QGIS, and it's enabled by default in QGIS 3.16. You can open the plugin from Database in the top navigation:
You'd select PostGIS and then the database connection you want to use, and click "Import Layer/File" in the DB Manager window:
For "Input," you can select a layer you've already opened in QGIS, or you can click the dotted button and navigate to where you've saved your spatial data file locally. Running the import through this tool should show you an "Import was successful" dialog when the import is complete.
B) Export to PostgreSQL (use for batch processing)
The other way to use QGIS to import spatial data to the database is by using the "Export to PostgreSQL" option from the Processing Toolbox. The Toolbox gives you easy access to functions or scripts you may use to process and analyze your spatial project.
In the Processing Toolbox panel, double click on "Export to PostgreSQL":
In the configuration window you'll find a lot of the same settings as in the DB Manager import tool, but the doc on QGIS' database algorithm and processing describes these settings as well for your reference.
After you run the export to PostgreSQL, you'll see that the Log should contain a "FINISHED" message.
You may have also noticed the "Run as Batch Process" option in the window. This lets you run the export with multiple QGIS layers or files. The DB Manager import, on the other hand, seems to only handle one layer at a time.
I do like the DB manager import method since it's a bit more intuitive for me personally, but check out this blog post - the author prefers the Export to PostgreSQL option since it uses SQL COPY (which we briefly touched on earlier). They cover additional methods for importing spatial data as well that I don't go into in this post, so I'd highly recommend checking that out as it's a great read either way.
I also want to note that QGIS has plugins that let you use an integrated Python console, or add R scripts. There may be even more interesting and advanced ways to use QGIS for your data pipeline, beyond the core GUI.
If you installed PostGIS on Windows using the Stack Builder utility, you have access to a GUI version of shp2pgsql too.
Navigate to the bin directory of your PostgreSQL installation (for example, C:\Program Files\PostgreSQL\12\bin). You should also see a postgisgui subdirectory. If you open that up, you'll find shp2pgsql-gui.exe:
(Note: shp2pgsql-gui is available as a plugin to pgAdmin 3, but I haven't yet found a way to add it in pgAdmin 4. There is a Geometry Data Viewer in pgAdmin, but that allows you to view spatial data already in your database.)
There also seems to be an ogr2ogr GUI available from http://www.ogr2gui.ca/, although I'm unable to find documentation or recent public links to download the application.
Go forth and import
shp2pgsql, ogr2ogr, and the copy command will probably have you covered for a wide range of spatial vector data that you'd store in PostGIS. QGIS also has functionality for connecting and importing to your database, and since it comes with a spatial viewer out of the box, it's a very handy tool to have at your disposal if you're able to use a desktop GUI.
New to PostGIS and the command-line tools I included here? Get your feet wet with our recently-published Loading Spatial Data course in the Crunchy Data Learning Portal.
If you've been working with PostGIS for a while, do you have other tools you like to use for your spatial data pipeline? Feel free to share in the comments below or with us on Twitter.
December 31, 2020 •More by this author