Over the past 12 months, AI has taken over budgets and initiatives. Postgres is a popular store for AI embedding data because it can store, calculate, optimize, and scale using the pgvector extension. A recently introduced gem to the Ruby on Rails ecosystem, the neighbor gem, makes working with pgvector and Rails even better.
An “embedding” is a set of floating point values that represent the
characteristics of a thing (nothing new, we’ve had these since the 70s). Using
the OpenAI API or any of their competitors, you can send over blocks of text,
images, and pdfs, and OpenAI will return an embedding with 1536 values
representing the characteristics. With the
pgvector extension, you can store
that embedding in a vector column type on Postgres. Then, using nearest neighbor
calculations, you can then find the most-similar objects. For a deeper review of
AI with Postgres, see my previous
posts in this series.
By default, Ruby on Rails does not know about the "vector" data type. If you've used Ruby on Rails + Postgres + pgvector, you've probably written SQL queries in your migrations, and implemented some other janky-code. The neighbor gem will remove the janky-code, and take you back to a native ActiveRecord experience.
At a minimum, all you have to do is add the following to you
Side note: I can't overstate the impact Andrew Kane has had on embedding data in Postgres. He's also making it easy for developers to use those vector data types with Ruby on Rails and Node.
The biggest risk of not using Neighbor is that ActiveRecord will create a
db/schema.rb file. Because ActiveRecord does not understand the
vector data type, instead of failing, running
rails db:schema:dump will omit
any table with that data type. It will show this error in your
# Could not dump table "recipe_embeddings" because of following StandardError # Unknown type 'vector(1536)' for column 'embedding'
With Neighbor, you'll get a fully-functional schema like the following:
create_table "recipe_embeddings", primary_key: "recipe_id", id: :bigint, default: nil, force: :cascade do |t| t.vector "embedding", limit: 1536, null: false t.datetime "created_at", null: false t.datetime "updated_at", null: false t.index ["embedding"], name: "recipe_embeddings_embedding", opclass: :vector_l2_ops, using: :hnsw t.index ["recipe_id"], name: "index_recipe_embeddings_on_recipe_id" end
Notice that Neighbor also understands the 
released with pgvector 0.5.
Side note: for projects that go all-in on Postgres, I opt to use the
following to dump to a
SCHEMA_FORMAT=sql rails db:schema:dump
Without Neighbor, ActiveRecord is not informed of vector. Just as your
config/schema.rb file is important for your typical migration would look
something like the following:
create_table :recipe_embeddings, primary_key: [:recipe_id] do |t| t.references :recipe, null: false, foreign_key: true t.vector :embedding, limit: 1536, null: false t.timestamps end
Additionally, you get improved handling of the vector data type. Without Neighbor,
working with embedding data required
to_s to manipulate the values when inserting
into Postgres. But, with Postgres, it's simplifies to a native process:
RecipeEmbedding.create!(recipe_id: Recipe.last.id, embedding: [-0.078427136, 0.0014401458, ...])
But, wait! There's more …
After you add the
embedding column to a table, you can use
define your nearest neighbor queries:
class RecipeEmbedding < ApplicationRecord has_neighbors :embedding end
Then, you can find the nearest neighbors like so:
recipe_embedding.nearest_neighbors(:embedding, distance: "euclidean").first
The distance calcuations include
Launching a project to use embeddings with Ruby on Rails?
Step 1: use the neighbor gem
Step 2: provision your database on Crunchy Bridge with pgvector
Step 3: profit
November 3, 2023 •More by this author