PostgreSQL: pgvector

PostgreSQL

PostgreSQL: pgvector

dewstream 2026. 3. 3. 08:00

728x90

※ PostgreSQL: pgvector.

Hi, this is dewstream.

It’s been a while. With security audit season keeping me busy through March, I haven’t had much time to publish new posts.

Recently, as part of proposing a new service, I started evaluating PostgreSQL’s pgvector as a vector database for LLM-based workloads.

In this post, I’ll walk through how to install pgvector and run a simple end-to-end test to see it in action.

GitHub - pgvector/pgvector: Open-source vector similarity search for Postgres

Open-source vector similarity search for Postgres. Contribute to pgvector/pgvector development by creating an account on GitHub.

github.com

1. What is pgvector?

pgvector is an open-source PostgreSQL extension that enables vector similarity search directly inside PostgreSQL. It allows you to store high-dimensional embedding vectors—such as those generated by LLMs—and perform efficient similarity queries using standard SQL.

With pgvector, you can define a vector column type, insert embeddings alongside your relational data, and run nearest-neighbor searches using distance operators like L2 (Euclidean), cosine distance, or inner product. It also supports indexing methods such as IVFFlat and HNSW to accelerate approximate nearest neighbor (ANN) search for large-scale datasets.

Because pgvector runs natively within PostgreSQL, it integrates seamlessly with existing schemas, transactions, access control, and backup strategies. This makes it especially attractive for production systems that want to add semantic search or retrieval-augmented generation (RAG) capabilities without introducing a separate vector database infrastructure.

In short, pgvector extends PostgreSQL into a vector database for LLM-powered applications—while preserving the reliability, consistency, and operational maturity of the PostgreSQL ecosystem.

2. Installation guide

This guide assumes that PostgreSQL is already installed on your system.

If you haven’t installed PostgreSQL yet, please refer to the related installation post below.

PostgreSQL: RHEL 기반 Linux에 PostgreSQL 설치 방법

※ Installing PostgreSQL 16 on Linux from source.※ Version: Linux 8.10 (Rocky), PostgreSQL 16.4.※ 싱글 환경은 CPU 2Core, Memory 4GB 이상을 권장합니다. 순서대로 따라 하시면 설치가 되는 마법을 경험하실 수 있습니다.

dewstream.tistory.com

Note:
The test environment used in this guide is Rocky Linux 8.10 with PostgreSQL 18.2.

2.1 Requirements

If you install pgvector by cloning the source code from GitHub and running make && make install, the following components are required:

Component	Requirement
PostgreSQL Server	Must already be installed
PostgreSQL Development Package	postgresql-devel (or postgresqlXX-devel matching your version)
Build Tools	gcc, make
pg_config	Must be included in the PostgreSQL installation and available in PATH

The key requirement is the postgresql-devel package.

This package provides the necessary header files and build infrastructure required to compile PostgreSQL extensions.

Example for Rocky Linux:

dnf install postgresql18-devel gcc make

2.2 Installation Steps (from Source)

▸Step 1. As root

cd /home/postgres

# Clone the repository
git clone https://github.com/pgvector/pgvector.git

# Set the pg_config path (adjust to your PostgreSQL installation path)
export PG_CONFIG=/postgres/bin/pg_config

cd pgvector
make

# Adjust ownership so the postgres user can proceed
chown -R postgres:postgres ../pgvector/*

▸Step 2. As postgres user

cd pgvector
make install

▸Step 3. Enable the Extension

# Connect to PostgreSQL
psql

-- Create the extension
CREATE EXTENSION vector;

▸Step 4. Verify the Installed Version

SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';

▸Step 5. Basic Functionality Test

-- Create a test table
CREATE TABLE test_vec (
    id bigserial PRIMARY KEY,
    embedding vector(3)
);

-- Insert sample data
INSERT INTO test_vec (embedding)
VALUES
('[1,2,3]'),
('[2,3,4]'),
('[10,10,10]');

-- Run a similarity search using L2 distance
SELECT *
FROM test_vec
ORDER BY embedding <-> '[1,2,3]'
LIMIT 3;

3. Test with Python (Local Embeddings with SentenceTransformers)

This section continues from the source-based installation above and walks you through a fully reproducible end-to-end test:

create embeddings locally (no external API),
load them into PostgreSQL with pgvector,
run similarity queries,
benchmark vector indexes (HNSW vs IVFFlat).

Test environment

OS: Rocky Linux 8.10
PostgreSQL: 18.x (example: 18.2)
DB role: postgres
Connection target: host=127.0.0.1 port=5432 dbname=postgres user=postgres
Execution user (OS): postgres

Why venv?
venv (Python virtual environment) isolates Python packages per project. It prevents conflicts with system Python packages and allows you to cleanly remove the environment by deleting the venv directory.

3.1 Install Python 3.11 + pip (one-time)

▸Step 1. Run as root (or with sudo)

sudo dnf -y install python3.11 python3.11-pip
python3.11 --version

▸Step 2. Create a venv and install dependencies (as postgres OS user)

python3.11 -m venv ~/pgvec-venv
source ~/pgvec-venv/bin/activate

python -m pip install -U pip
python -m pip install "psycopg[binary]" pgvector sentence-transformers numpy

Notes:
sentence-transformers requires Python 3.10+.
pgvector (Python) provides register_vector() integration for psycopg.

▸Step 3. Load sample documents + embeddings into PostgreSQL (as postgres OS user)

This script:

loads a multilingual embedding model locally,
creates a schema/table,
inserts (content, embedding) rows.

# 1. Create ~/load_documents.py

cat > ~/load_documents.py <<'PY'
import os
import re
import psycopg
from pgvector.psycopg import register_vector
from sentence_transformers import SentenceTransformer

DB_DSN = os.environ.get("PG_DSN", "host=127.0.0.1 port=5432 dbname=postgres user=postgres")
SCHEMA = os.environ.get("PG_SCHEMA", "vecdemo")
MODEL_NAME = os.environ.get(
    "EMBED_MODEL",
    "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)

DOCS = [
    "How to install PostgreSQL 18 on Rocky Linux",
    "How to install pgvector and use the vector type in PostgreSQL",
    "Differences between HNSW and IVFFlat indexes in pgvector",
    "Cosine similarity vs L2 distance for vector search",
    "How to measure query performance with EXPLAIN ANALYZE in PostgreSQL",
    "Tuning ef_search and probes for ANN search",
    "Using COPY for bulk loading embeddings into PostgreSQL",
    "Typical RAG architecture patterns using a vector store",
    "Choosing multilingual embedding models for Korean text",
    "Creating indexes after bulk loading is usually more efficient",
]

RESET = os.environ.get("RESET", "0") == "1"

_ident = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
def ident(name: str) -> str:
    if not _ident.match(name):
        raise ValueError(f"Unsafe identifier: {name}")
    return name

def main():
    schema = ident(SCHEMA)
    print(f"Connecting: {DB_DSN}")
    print(f"Schema: {schema}")
    print(f"Loading model: {MODEL_NAME}")

    model = SentenceTransformer(MODEL_NAME)
    dim = model.get_sentence_embedding_dimension()
    print(f"Embedding dimension: {dim}")

    embeddings = model.encode(DOCS, normalize_embeddings=False)

    with psycopg.connect(DB_DSN) as conn:
        register_vector(conn)

        with conn.cursor() as cur:
            cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
            cur.execute(f"CREATE SCHEMA IF NOT EXISTS {schema};")

            cur.execute(f"""
                CREATE TABLE IF NOT EXISTS {schema}.documents (
                    id bigserial PRIMARY KEY,
                    content text NOT NULL,
                    embedding vector({dim}) NOT NULL
                );
            """)

            if RESET:
                cur.execute(f"TRUNCATE {schema}.documents;")
                print("TRUNCATE done")

            rows = [(DOCS[i], embeddings[i]) for i in range(len(DOCS))]
            cur.executemany(
                f"INSERT INTO {schema}.documents (content, embedding) VALUES (%s, %s);",
                rows
            )

        conn.commit()

    print(f"Inserted {len(DOCS)} rows into {schema}.documents")

if __name__ == "__main__":
    main()
PY

# 2. Run the loader
source ~/pgvec-venv/bin/activate

export PG_DSN="host=127.0.0.1 port=5432 dbname=postgres user=postgres"
export PG_SCHEMA="vecdemo"

python ~/load_documents.py

# Verify rows
psql "host=127.0.0.1 port=5432 dbname=postgres user=postgres" -c "TABLE vecdemo.documents;"

3.2 Run an embedding-based similarity query (as postgres OS user)

▸Step 1. Create ~/query_documents.py

cat > ~/query_documents.py <<'PY'
import os
import re
import sys
import psycopg
from pgvector.psycopg import register_vector
from sentence_transformers import SentenceTransformer

DB_DSN = os.environ.get("PG_DSN", "host=127.0.0.1 port=5432 dbname=postgres user=postgres")
SCHEMA = os.environ.get("PG_SCHEMA", "vecdemo")
MODEL_NAME = os.environ.get(
    "EMBED_MODEL",
    "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)

_ident = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
def ident(name: str) -> str:
    if not _ident.match(name):
        raise ValueError(f"Unsafe identifier: {name}")
    return name

def main():
    schema = ident(SCHEMA)
    query = " ".join(sys.argv[1:]).strip() or "How to tune HNSW ef_search in pgvector"

    model = SentenceTransformer(MODEL_NAME)
    q = model.encode([query], normalize_embeddings=False)[0]

    with psycopg.connect(DB_DSN) as conn:
        register_vector(conn)

        # Cosine distance operator: <=> (smaller is closer)
        sql = f"""
        SELECT
            id,
            content,
            (embedding <=> %s) AS cosine_distance,
            (1 - (embedding <=> %s)) AS cosine_similarity
        FROM {schema}.documents
        ORDER BY embedding <=> %s
        LIMIT 5;
        """
        rows = conn.execute(sql, (q, q, q)).fetchall()

    print(f"Query: {query}")
    print("-" * 70)
    for r in rows:
        print(f"[id={r[0]}] dist={r[2]:.6f} sim={r[3]:.6f} | {r[1]}")

if __name__ == "__main__":
    main()
PY

▸Step 2. Run the query

source ~/pgvec-venv/bin/activate

export PG_DSN="host=127.0.0.1 port=5432 dbname=postgres user=postgres"
export PG_SCHEMA="vecdemo"

python ~/query_documents.py "Differences between HNSW and IVFFlat in pgvector"

4. HNSW vs IVFFlat: Conceptual Overview and Performance Analysis

Conceptual Differences

pgvector provides two primary approximate nearest neighbor (ANN) index types:

HNSW (Hierarchical Navigable Small World)
IVFFlat (Inverted File with Flat quantization)

Although both are designed to accelerate similarity search, they differ fundamentally in structure and tuning behavior.

HNSW

HNSW builds a multi-layer graph structure where each vector is connected to a set of neighboring vectors. During a search, the algorithm navigates the graph from higher layers down to lower ones, progressively refining the search.

Key characteristics:

Fast and high recall
Memory-intensive
No training phase required
Tuned primarily with:
- m (graph connectivity)
- ef_construction (index build quality)
- hnsw.ef_search (query-time search breadth)

ef_search controls how many candidates are explored during query time.
Higher values increase recall but also increase latency and memory access.

IVFFlat

IVFFlat partitions the dataset into multiple inverted lists (clusters).
Each query probes a subset of these lists to retrieve candidate vectors, then performs exact distance calculations within those lists.

Key characteristics:

Requires a training phase (k-means style clustering)
Memory-efficient compared to HNSW
Tuned primarily with:
- lists (number of clusters at index build time)
- ivfflat.probes (number of clusters searched at query time)

Higher probes means more clusters are scanned, increasing recall but also increasing latency.

5. Vector index test (HNSW vs IVFFlat)

In Section 3, the test dataset consists of only about 10 documents, so indexing does not provide meaningful results.

To properly evaluate indexing performance, a larger dataset needs to be loaded.

In the following section, we will create a large table populated with random vectors (to test index behavior) and then benchmark both HNSW and IVFFlat.

5.1 Bulk load random vectors (Binary COPY)

▸Step 1. Create ~/load_random_items.py

cat > ~/load_random_items.py <<'PY'
import os
import re
import numpy as np
import psycopg
from pgvector.psycopg import register_vector

DB_DSN = os.environ.get("PG_DSN", "host=127.0.0.1 port=5432 dbname=postgres user=postgres")
SCHEMA = os.environ.get("PG_SCHEMA", "vecdemo")

ROWS = int(os.environ.get("ROWS", "200000"))
DIMS = int(os.environ.get("DIMS", "128"))
CREATE_INDEX = os.environ.get("CREATE_INDEX", "1") == "1"

_ident = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
def ident(name: str) -> str:
    if not _ident.match(name):
        raise ValueError(f"Unsafe identifier: {name}")
    return name

def main():
    schema = ident(SCHEMA)

    print(f"Connecting: {DB_DSN}")
    print(f"Generating random embeddings: rows={ROWS}, dims={DIMS}")

    embeddings = np.random.rand(ROWS, DIMS).astype("float32")

    conn = psycopg.connect(DB_DSN, autocommit=True)
    register_vector(conn)

    conn.execute("CREATE EXTENSION IF NOT EXISTS vector;")
    conn.execute(f"CREATE SCHEMA IF NOT EXISTS {schema};")

    conn.execute(f"DROP TABLE IF EXISTS {schema}.items_rand;")
    conn.execute(f"CREATE TABLE {schema}.items_rand (id bigserial PRIMARY KEY, embedding vector({DIMS}) NOT NULL);")

    print("Loading via binary COPY ...")
    cur = conn.cursor()
    with cur.copy(f"COPY {schema}.items_rand (embedding) FROM STDIN WITH (FORMAT BINARY)") as copy:
        copy.set_types(["vector"])
        for i, emb in enumerate(embeddings, 1):
            copy.write_row([emb])
            if i % 10000 == 0:
                print(".", end="", flush=True)
    print("\nCOPY done.")

    conn.close()
    print("Success!")

if __name__ == "__main__":
    main()
PY

▸Step2. Run loader (example: 200k rows)

source ~/pgvec-venv/bin/activate

export PG_DSN="host=127.0.0.1 port=5432 dbname=postgres user=postgres"
export PG_SCHEMA="vecdemo"

ROWS=200000 DIMS=128 CREATE_INDEX=1 python ~/load_random_items.py

5.2 Benchmark queries with EXPLAIN (ANALYZE, BUFFERS)

▸HNSW: vary ef_search

psql "host=127.0.0.1 port=5432 dbname=postgres user=postgres" <<'SQL'

DROP INDEX IF EXISTS vecdemo.vecdemo_items_rand_ivf_cos_idx;

CREATE INDEX IF NOT EXISTS vecdemo_items_rand_hnsw_cos_idx
ON vecdemo.items_rand
USING hnsw (embedding vector_cosine_ops);

ANALYZE vecdemo.items_rand;

SET hnsw.ef_search = 40;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM vecdemo.items_rand
ORDER BY embedding <=> (SELECT embedding FROM vecdemo.items_rand WHERE id = 1000)
LIMIT 10;

SET hnsw.ef_search = 200;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM vecdemo.items_rand
ORDER BY embedding <=> (SELECT embedding FROM vecdemo.items_rand WHERE id = 1000)
LIMIT 10;

RESET hnsw.ef_search;

SQL

▸ IVFFlat: vary probes

psql "host=127.0.0.1 port=5432 dbname=postgres user=postgres" <<'SQL'

DROP INDEX IF EXISTS vecdemo.vecdemo_items_rand_hnsw_cos_idx;

CREATE INDEX IF NOT EXISTS vecdemo_items_rand_ivf_cos_idx
ON vecdemo.items_rand
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 200);

ANALYZE vecdemo.items_rand;

SET ivfflat.probes = 1;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM vecdemo.items_rand
ORDER BY embedding <=> (SELECT embedding FROM vecdemo.items_rand WHERE id = 1000)
LIMIT 10;

SET ivfflat.probes = 20;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM vecdemo.items_rand
ORDER BY embedding <=> (SELECT embedding FROM vecdemo.items_rand WHERE id = 1000)
LIMIT 10;

RESET ivfflat.probes;

SQL

Expected outcome:
As you increase hnsw.ef_search or ivfflat.probes, you typically trade speed for better recall.
EXPLAIN (ANALYZE, BUFFERS) output should show index usage and measured execution time.

+ Troubleshooting checklist

Authentication failed / password prompt
- Confirm .pgpass permission is 0600
- Confirm pg_hba.conf has 127.0.0.1/32 rule for postgres
Connection refused
- Confirm PostgreSQL is listening on 127.0.0.1 (listen_addresses) and port=5432
Extension not found
- Confirm make install completed successfully and CREATE EXTENSION vector; works
Model download issues
- SentenceTransformers downloads models on first run; ensure outbound HTTPS is allowed

5.3 Performance Results from the Test

HNSW: Effect of ef_search

ef_search = 40
• Execution Time: 1.470 ms
• Buffers (shared hit): 1655
• Plan Cost: 89 → 91

ef_search = 200
• Execution Time: 3.580 ms
• Buffers (shared hit): 5721
• Plan Cost: 303 → 305

Interpretation
Increasing ef_search from 40 to 200:
• Latency increased from 1.47 ms → 3.58 ms (~2.4×)
• Buffer hits increased from 1655 → 5721 (~3.5×)
This behavior is expected.

A higher ef_search means:
• More graph nodes are explored
• More candidate vectors are evaluated
• More memory pages are touched

In short:
• Higher recall → Higher CPU work → Higher buffer usage → Higher latency

The performance increase is gradual and predictable, which is one of HNSW's strengths.

IVFFlat: Effect of probes

probes = 1
• Execution Time: 1.344 ms
• Buffers (shared hit): 147
• Plan Cost: 80 → 82

probes = 20
• Execution Time: 6.989 ms
• Buffers (shared hit): 1769
• Plan Cost: 1564 → 1565

Interpretation
Increasing probes from 1 to 20:
• Latency increased from 1.34 ms → 6.99 ms (~5.2×)
• Buffer hits increased from 147 → 1769 (~12×)
This reflects IVFFlat’s structure:
• Each probe adds another inverted list to scan.
• More lists mean more disk/memory page reads.
• More candidates must be re-ranked.

Compared to HNSW, IVFFlat showed a steeper growth in buffer usage and execution time as search breadth increased.

Direct Comparison Under This Dataset

Under the same workload:

Configuration	Execution Time
HNSW (ef=40)	1.47 ms
IVFFlat (probes=1)	1.34 ms
HNSW (ef=200)	3.58 ms
IVFFlat (probes=20)	6.99 ms

Observations:

At low search depth, both indexes perform similarly.
As search breadth increases:
- HNSW scales more gradually.
- IVFFlat shows sharper cost growth.
HNSW appears more stable when increasing recall requirements in this dataset.

However:
These results reflect latency only.
Proper evaluation requires measuring recall (accuracy) alongside execution time.

Final Takeaway

This test clearly demonstrates:

Both indexes respond correctly to tuning parameters.
Increasing search breadth increases buffer usage and latency.
HNSW scales more smoothly under higher recall configurations.
IVFFlat can be extremely fast at low probe counts but becomes significantly more expensive as probes increase.

To complete the evaluation, the next logical step is to measure:

Recall@10
Recall@K vs latency curves
Optimal parameter combinations for your workload

In production systems, the right choice depends on:

Data size
Dimensionality
Update frequency
Acceptable recall level
Memory budget

There is no universal winner — only workload-dependent trade-offs.

5.4 Practical Guidance

Choose HNSW when:

High recall is important
Memory usage is acceptable
Query latency must remain predictable
Dataset is frequently updated (no training phase)

Choose IVFFlat when:

Memory footprint must be controlled
Dataset is large but relatively static
You are willing to tune lists and probes
Slightly lower recall is acceptable for lower cost

Thanks for reading!

I’ll be back soon with another post—see you next time.

728x90

저작자표시 비영리 변경금지 (새창열림)

'PostgreSQL' 카테고리의 다른 글

PostgreSQL: fillfactor (0)	2026.03.18
PostgreSQL: pg_vacman (0)	2026.01.25
PostgreSQL: DEFAULT PRIVILEGES (0)	2025.12.31
PostgreSQL: pg_mooncake (0)	2025.12.24
PostgreSQL: PgBouncer (0)	2025.12.17

현재글PostgreSQL: pgvector

DewStream

📧: dewstream.kr@gmail.com

250x250

SQL, 최적화, Parameter, 튜닝, 포스트그레스, PostgreSQL, PG, 일기, 리눅스, linux, HA, essay, 에세이, life, 고가용성, 파라미터, Query, Optimization, tuning, 일상,

Today :
Yesterday :

DewStream