The pgrag extension
Create end-to-end Retrieval-Augmented Generation (RAG) pipelines
What you will learn:
What is RAG?
What's included in a RAG pipeline?
pgrag
functionsHow to use
pgrag
Related resources
Source code
The pgrag
extension and its accompanying model extensions are designed for creating end-to-end Retrieval-Augmented Generation (RAG) pipelines without leaving your SQL client. No additional programming languages or libraries are required. With functions provided by pgrag
and a Postgres database with pgvector
, you can build a complete RAG pipeline via SQL.
Experimental Feature
The pgrag
extension is experimental and actively being developed. Use it with caution as functionality may change.
What is RAG?
RAG stands for Retrieval-Augmented Generation. It's the search for information relevant to a question that includes information alongside the question in a prompt to an AI chat model. For example, "ChatGPT, please answer questions x using information Y".
What's included in a RAG pipeline?
A RAG pipeline includes a number of steps, as illustrated in the following diagram.
The steps outlined above can be organized into two main stages:
- Preparing and indexing the information:
- Load documents and extract text
- Split documents into chunks
- Generate embeddings for chunks
- Store the embeddings alongside chunks
- Handling incoming questions: 5. Vectorize question 6. Use question embedding to find relevant document chunks 7. Retrieve document chunks from database 8. Rerank and take only best-match chunks to answer question 9. Prompt with question + relevant document chunks to answer question 10. Generated answer
What does pgrag support?
With the exception of (4) storing embeddings in the database and (7) Retrieve document chunks from database, which is supported by Postgres with pgvector
, pgrag
supports all of the steps listed above. Specifically, pgrag
supports:
-
Text extraction and conversion
- Simple text extraction from PDF documents (using pdf-extract). Currently, there is no Optical Character Recognition (OCR) or support for complex layout and formatting.
- Simple text extraction from
.docx
documents (using docx-rs). - HTML conversion to Markdown (using htmd).
-
Text chunking
- Text chunking by character count (using text-splitter).
- Text chunking by token count (also using text-splitter).
-
Local embedding and reranking models
- Local tokenising + embedding generation with 33M parameter model bge-small-en-v1.5 (using ort via fastembed).
- Local tokenising + reranking with 33M parameter model jina-reranker-v1-tiny-en (also using ort via fastembed).
note
These models run locally on your Postgres server. They are packaged as separate extensions that accompany
pgrag
, because they are large (>100MB), and because we may want to add support for more models in future in the form of additionalpgrag
model extensions. -
Remote embedding and chat models
Installation
warning
As an experimental extension, pgrag
may be unstable or introduce backward-incompatible changes. We recommend using it only in a separate, dedicated Neon project. To proceed with the installation, you will need to run the following command first:
To install pgrag
to a Neon Postgres database, run the following commands:
The first extension is the pgrag
extension. The other two extensions are the model extensions for local tokenising, embedding generation, and reranking. The three extensions have no dependencies on each other, but all depend on pgvector
. Specifying cascade
ensures that pgvector
is installed.
pgrag functions
This section lists the functions provided by pgrag
. For function usage examples, refer to the end-to-end RAG example below or the pgrag GitHub repository.
-
Text extraction
These functions extract text from PDFs, Word files, and HTML.
rag.text_from_pdf(bytea) -> text
rag.text_from_docx(bytea) -> text
rag.markdown_from_html(text) -> text
-
Splitting text into chunks
These functions split the extracted text into chunks by character count or token count.
rag.chunks_by_character_count(text, max_chars, overlap) -> text[]
rag_bge_small_en_v15.chunks_by_token_count(text, max_tokens, overlap) -> text[]
-
Generating embeddings for chunks
These functions generate embeddings for chunks either directly in the extension using a small but best-in-class model on the database server or by calling out to a 3rd-party API such as OpenAI.
rag_bge_small_en_v15.embedding_for_passage(text) -> vector(384)
rag.openai_text_embedding_3_small(text) -> vector(1536)
-
Generating embeddings for questions
These functions generate embeddings for the questions.
rag_bge_small_en_v15.embedding_for_query(text) -> vector(384)
rag.openai_text_embedding_3_small(text) -> vector(1536)
-
Reranking
This function reranks chunks against the question using a small but best-in-class model that runs locally on your Postgres server.
rag_jina_reranker_v1_tiny_en.rerank_distance(text, text) -> real
-
Calling out to chat models
This function makes API calls to AI chat models such as ChatGPT to generate an answer using the question and the chunks together.
rag.openai_chat_completion(json) -> json
End-to-end RAG example
1. Create a docs
table and ingest some PDF documents as text
2. Create an embeddings
table, chunk the text, and generate embeddings for the chunks (performed locally)
3. Query the embeddings and rerank the results (performed locally)
4. Feed the query and top chunks to a remote AI chat model such as ChatGPT to complete the RAG pipeline
Need help?
Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more details, see Getting Support.