※ PostgreSQL: pgvector.
Hi, this is dewstream.
It’s been a while. With security audit season keeping me busy through March, I haven’t had much time to publish new posts.
Recently, as part of proposing a new service, I started evaluating PostgreSQL’s pgvector as a vector database for LLM-based workloads.
In this post, I’ll walk through how to install pgvector and run a simple end-to-end test to see it in action.
GitHub - pgvector/pgvector: Open-source vector similarity search for Postgres
Open-source vector similarity search for Postgres. Contribute to pgvector/pgvector development by creating an account on GitHub.
github.com
1. What is pgvector?
pgvector is an open-source PostgreSQL extension that enables vector similarity search directly inside PostgreSQL. It allows you to store high-dimensional embedding vectors—such as those generated by LLMs—and perform efficient similarity queries using standard SQL.
With pgvector, you can define a vector column type, insert embeddings alongside your relational data, and run nearest-neighbor searches using distance operators like L2 (Euclidean), cosine distance, or inner product. It also supports indexing methods such as IVFFlat and HNSW to accelerate approximate nearest neighbor (ANN) search for large-scale datasets.
Because pgvector runs natively within PostgreSQL, it integrates seamlessly with existing schemas, transactions, access control, and backup strategies. This makes it especially attractive for production systems that want to add semantic search or retrieval-augmented generation (RAG) capabilities without introducing a separate vector database infrastructure.
In short, pgvector extends PostgreSQL into a vector database for LLM-powered applications—while preserving the reliability, consistency, and operational maturity of the PostgreSQL ecosystem.
2. Installation guide
This guide assumes that PostgreSQL is already installed on your system.
If you haven’t installed PostgreSQL yet, please refer to the related installation post below.
PostgreSQL: RHEL 기반 Linux에 PostgreSQL 설치 방법
※ Installing PostgreSQL 16 on Linux from source.※ Version: Linux 8.10 (Rocky), PostgreSQL 16.4.※ 싱글 환경은 CPU 2Core, Memory 4GB 이상을 권장합니다. 순서대로 따라 하시면 설치가 되는 마법을 경험하실 수 있습니다.
dewstream.tistory.com
Note:
The test environment used in this guide is Rocky Linux 8.10 with PostgreSQL 18.2.
2.1 Requirements
If you install pgvector by cloning the source code from GitHub and running make && make install, the following components are required:
| Component | Requirement |
| PostgreSQL Server | Must already be installed |
| PostgreSQL Development Package | postgresql-devel (or postgresqlXX-devel matching your version) |
| Build Tools | gcc, make |
| pg_config | Must be included in the PostgreSQL installation and available in PATH |
The key requirement is the postgresql-devel package.
This package provides the necessary header files and build infrastructure required to compile PostgreSQL extensions.
Example for Rocky Linux:
dnf install postgresql18-devel gcc make
2.2 Installation Steps (from Source)
▸Step 1. As root
cd /home/postgres
# Clone the repository
git clone https://github.com/pgvector/pgvector.git
# Set the pg_config path (adjust to your PostgreSQL installation path)
export PG_CONFIG=/postgres/bin/pg_config
cd pgvector
make
# Adjust ownership so the postgres user can proceed
chown -R postgres:postgres ../pgvector/*
▸Step 2. As postgres user
cd pgvector
make install
▸Step 3. Enable the Extension
# Connect to PostgreSQL
psql
-- Create the extension
CREATE EXTENSION vector;
▸Step 4. Verify the Installed Version
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';
▸Step 5. Basic Functionality Test
-- Create a test table
CREATE TABLE test_vec (
id bigserial PRIMARY KEY,
embedding vector(3)
);
-- Insert sample data
INSERT INTO test_vec (embedding)
VALUES
('[1,2,3]'),
('[2,3,4]'),
('[10,10,10]');
-- Run a similarity search using L2 distance
SELECT *
FROM test_vec
ORDER BY embedding <-> '[1,2,3]'
LIMIT 3;
3. Test with Python (Local Embeddings with SentenceTransformers)
This section continues from the source-based installation above and walks you through a fully reproducible end-to-end test:
- create embeddings locally (no external API),
- load them into PostgreSQL with pgvector,
- run similarity queries,
- benchmark vector indexes (HNSW vs IVFFlat).
Test environment
- OS: Rocky Linux 8.10
- PostgreSQL: 18.x (example: 18.2)
- DB role: postgres
- Connection target: host=127.0.0.1 port=5432 dbname=postgres user=postgres
- Execution user (OS): postgres
Why venv?
venv (Python virtual environment) isolates Python packages per project. It prevents conflicts with system Python packages and allows you to cleanly remove the environment by deleting the venv directory.
3.1 Install Python 3.11 + pip (one-time)
▸Step 1. Run as root (or with sudo)
sudo dnf -y install python3.11 python3.11-pip
python3.11 --version
▸Step 2. Create a venv and install dependencies (as postgres OS user)
python3.11 -m venv ~/pgvec-venv
source ~/pgvec-venv/bin/activate
python -m pip install -U pip
python -m pip install "psycopg[binary]" pgvector sentence-transformers numpy
Notes:
sentence-transformers requires Python 3.10+.
pgvector (Python) provides register_vector() integration for psycopg.
▸Step 3. Load sample documents + embeddings into PostgreSQL (as postgres OS user)
This script:
- loads a multilingual embedding model locally,
- creates a schema/table,
- inserts (content, embedding) rows.
# 1. Create ~/load_documents.py
cat > ~/load_documents.py <<'PY'
import os
import re
import psycopg
from pgvector.psycopg import register_vector
from sentence_transformers import SentenceTransformer
DB_DSN = os.environ.get("PG_DSN", "host=127.0.0.1 port=5432 dbname=postgres user=postgres")
SCHEMA = os.environ.get("PG_SCHEMA", "vecdemo")
MODEL_NAME = os.environ.get(
"EMBED_MODEL",
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)
DOCS = [
"How to install PostgreSQL 18 on Rocky Linux",
"How to install pgvector and use the vector type in PostgreSQL",
"Differences between HNSW and IVFFlat indexes in pgvector",
"Cosine similarity vs L2 distance for vector search",
"How to measure query performance with EXPLAIN ANALYZE in PostgreSQL",
"Tuning ef_search and probes for ANN search",
"Using COPY for bulk loading embeddings into PostgreSQL",
"Typical RAG architecture patterns using a vector store",
"Choosing multilingual embedding models for Korean text",
"Creating indexes after bulk loading is usually more efficient",
]
RESET = os.environ.get("RESET", "0") == "1"
_ident = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
def ident(name: str) -> str:
if not _ident.match(name):
raise ValueError(f"Unsafe identifier: {name}")
return name
def main():
schema = ident(SCHEMA)
print(f"Connecting: {DB_DSN}")
print(f"Schema: {schema}")
print(f"Loading model: {MODEL_NAME}")
model = SentenceTransformer(MODEL_NAME)
dim = model.get_sentence_embedding_dimension()
print(f"Embedding dimension: {dim}")
embeddings = model.encode(DOCS, normalize_embeddings=False)
with psycopg.connect(DB_DSN) as conn:
register_vector(conn)
with conn.cursor() as cur:
cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
cur.execute(f"CREATE SCHEMA IF NOT EXISTS {schema};")
cur.execute(f"""
CREATE TABLE IF NOT EXISTS {schema}.documents (
id bigserial PRIMARY KEY,
content text NOT NULL,
embedding vector({dim}) NOT NULL
);
""")
if RESET:
cur.execute(f"TRUNCATE {schema}.documents;")
print("TRUNCATE done")
rows = [(DOCS[i], embeddings[i]) for i in range(len(DOCS))]
cur.executemany(
f"INSERT INTO {schema}.documents (content, embedding) VALUES (%s, %s);",
rows
)
conn.commit()
print(f"Inserted {len(DOCS)} rows into {schema}.documents")
if __name__ == "__main__":
main()
PY
# 2. Run the loader
source ~/pgvec-venv/bin/activate
export PG_DSN="host=127.0.0.1 port=5432 dbname=postgres user=postgres"
export PG_SCHEMA="vecdemo"
python ~/load_documents.py
# Verify rows
psql "host=127.0.0.1 port=5432 dbname=postgres user=postgres" -c "TABLE vecdemo.documents;"
3.2 Run an embedding-based similarity query (as postgres OS user)
▸Step 1. Create ~/query_documents.py
cat > ~/query_documents.py <<'PY'
import os
import re
import sys
import psycopg
from pgvector.psycopg import register_vector
from sentence_transformers import SentenceTransformer
DB_DSN = os.environ.get("PG_DSN", "host=127.0.0.1 port=5432 dbname=postgres user=postgres")
SCHEMA = os.environ.get("PG_SCHEMA", "vecdemo")
MODEL_NAME = os.environ.get(
"EMBED_MODEL",
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)
_ident = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
def ident(name: str) -> str:
if not _ident.match(name):
raise ValueError(f"Unsafe identifier: {name}")
return name
def main():
schema = ident(SCHEMA)
query = " ".join(sys.argv[1:]).strip() or "How to tune HNSW ef_search in pgvector"
model = SentenceTransformer(MODEL_NAME)
q = model.encode([query], normalize_embeddings=False)[0]
with psycopg.connect(DB_DSN) as conn:
register_vector(conn)
# Cosine distance operator: <=> (smaller is closer)
sql = f"""
SELECT
id,
content,
(embedding <=> %s) AS cosine_distance,
(1 - (embedding <=> %s)) AS cosine_similarity
FROM {schema}.documents
ORDER BY embedding <=> %s
LIMIT 5;
"""
rows = conn.execute(sql, (q, q, q)).fetchall()
print(f"Query: {query}")
print("-" * 70)
for r in rows:
print(f"[id={r[0]}] dist={r[2]:.6f} sim={r[3]:.6f} | {r[1]}")
if __name__ == "__main__":
main()
PY
▸Step 2. Run the query
source ~/pgvec-venv/bin/activate
export PG_DSN="host=127.0.0.1 port=5432 dbname=postgres user=postgres"
export PG_SCHEMA="vecdemo"
python ~/query_documents.py "Differences between HNSW and IVFFlat in pgvector"
4. HNSW vs IVFFlat: Conceptual Overview and Performance Analysis
Conceptual Differences
pgvector provides two primary approximate nearest neighbor (ANN) index types:
- HNSW (Hierarchical Navigable Small World)
- IVFFlat (Inverted File with Flat quantization)
Although both are designed to accelerate similarity search, they differ fundamentally in structure and tuning behavior.
HNSW
HNSW builds a multi-layer graph structure where each vector is connected to a set of neighboring vectors. During a search, the algorithm navigates the graph from higher layers down to lower ones, progressively refining the search.
Key characteristics:
- Fast and high recall
- Memory-intensive
- No training phase required
- Tuned primarily with:
- m (graph connectivity)
- ef_construction (index build quality)
- hnsw.ef_search (query-time search breadth)
ef_search controls how many candidates are explored during query time.
Higher values increase recall but also increase latency and memory access.
IVFFlat
IVFFlat partitions the dataset into multiple inverted lists (clusters).
Each query probes a subset of these lists to retrieve candidate vectors, then performs exact distance calculations within those lists.
Key characteristics:
- Requires a training phase (k-means style clustering)
- Memory-efficient compared to HNSW
- Tuned primarily with:
- lists (number of clusters at index build time)
- ivfflat.probes (number of clusters searched at query time)
Higher probes means more clusters are scanned, increasing recall but also increasing latency.
5. Vector index test (HNSW vs IVFFlat)
In Section 3, the test dataset consists of only about 10 documents, so indexing does not provide meaningful results.
To properly evaluate indexing performance, a larger dataset needs to be loaded.
In the following section, we will create a large table populated with random vectors (to test index behavior) and then benchmark both HNSW and IVFFlat.
5.1 Bulk load random vectors (Binary COPY)
▸Step 1. Create ~/load_random_items.py
cat > ~/load_random_items.py <<'PY'
import os
import re
import numpy as np
import psycopg
from pgvector.psycopg import register_vector
DB_DSN = os.environ.get("PG_DSN", "host=127.0.0.1 port=5432 dbname=postgres user=postgres")
SCHEMA = os.environ.get("PG_SCHEMA", "vecdemo")
ROWS = int(os.environ.get("ROWS", "200000"))
DIMS = int(os.environ.get("DIMS", "128"))
CREATE_INDEX = os.environ.get("CREATE_INDEX", "1") == "1"
_ident = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
def ident(name: str) -> str:
if not _ident.match(name):
raise ValueError(f"Unsafe identifier: {name}")
return name
def main():
schema = ident(SCHEMA)
print(f"Connecting: {DB_DSN}")
print(f"Generating random embeddings: rows={ROWS}, dims={DIMS}")
embeddings = np.random.rand(ROWS, DIMS).astype("float32")
conn = psycopg.connect(DB_DSN, autocommit=True)
register_vector(conn)
conn.execute("CREATE EXTENSION IF NOT EXISTS vector;")
conn.execute(f"CREATE SCHEMA IF NOT EXISTS {schema};")
conn.execute(f"DROP TABLE IF EXISTS {schema}.items_rand;")
conn.execute(f"CREATE TABLE {schema}.items_rand (id bigserial PRIMARY KEY, embedding vector({DIMS}) NOT NULL);")
print("Loading via binary COPY ...")
cur = conn.cursor()
with cur.copy(f"COPY {schema}.items_rand (embedding) FROM STDIN WITH (FORMAT BINARY)") as copy:
copy.set_types(["vector"])
for i, emb in enumerate(embeddings, 1):
copy.write_row([emb])
if i % 10000 == 0:
print(".", end="", flush=True)
print("\nCOPY done.")
conn.close()
print("Success!")
if __name__ == "__main__":
main()
PY
▸Step2. Run loader (example: 200k rows)
source ~/pgvec-venv/bin/activate
export PG_DSN="host=127.0.0.1 port=5432 dbname=postgres user=postgres"
export PG_SCHEMA="vecdemo"
ROWS=200000 DIMS=128 CREATE_INDEX=1 python ~/load_random_items.py
5.2 Benchmark queries with EXPLAIN (ANALYZE, BUFFERS)
▸HNSW: vary ef_search
psql "host=127.0.0.1 port=5432 dbname=postgres user=postgres" <<'SQL'
DROP INDEX IF EXISTS vecdemo.vecdemo_items_rand_ivf_cos_idx;
CREATE INDEX IF NOT EXISTS vecdemo_items_rand_hnsw_cos_idx
ON vecdemo.items_rand
USING hnsw (embedding vector_cosine_ops);
ANALYZE vecdemo.items_rand;
SET hnsw.ef_search = 40;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM vecdemo.items_rand
ORDER BY embedding <=> (SELECT embedding FROM vecdemo.items_rand WHERE id = 1000)
LIMIT 10;
SET hnsw.ef_search = 200;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM vecdemo.items_rand
ORDER BY embedding <=> (SELECT embedding FROM vecdemo.items_rand WHERE id = 1000)
LIMIT 10;
RESET hnsw.ef_search;
SQL
▸ IVFFlat: vary probes
psql "host=127.0.0.1 port=5432 dbname=postgres user=postgres" <<'SQL'
DROP INDEX IF EXISTS vecdemo.vecdemo_items_rand_hnsw_cos_idx;
CREATE INDEX IF NOT EXISTS vecdemo_items_rand_ivf_cos_idx
ON vecdemo.items_rand
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 200);
ANALYZE vecdemo.items_rand;
SET ivfflat.probes = 1;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM vecdemo.items_rand
ORDER BY embedding <=> (SELECT embedding FROM vecdemo.items_rand WHERE id = 1000)
LIMIT 10;
SET ivfflat.probes = 20;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM vecdemo.items_rand
ORDER BY embedding <=> (SELECT embedding FROM vecdemo.items_rand WHERE id = 1000)
LIMIT 10;
RESET ivfflat.probes;
SQL
Expected outcome:
As you increase hnsw.ef_search or ivfflat.probes, you typically trade speed for better recall.
EXPLAIN (ANALYZE, BUFFERS) output should show index usage and measured execution time.
+ Troubleshooting checklist
- Authentication failed / password prompt
- Confirm .pgpass permission is 0600
- Confirm pg_hba.conf has 127.0.0.1/32 rule for postgres
- Connection refused
- Confirm PostgreSQL is listening on 127.0.0.1 (listen_addresses) and port=5432
- Extension not found
- Confirm make install completed successfully and CREATE EXTENSION vector; works
- Model download issues
- SentenceTransformers downloads models on first run; ensure outbound HTTPS is allowed
5.3 Performance Results from the Test
HNSW: Effect of ef_search

ef_search = 40
• Execution Time: 1.470 ms
• Buffers (shared hit): 1655
• Plan Cost: 89 → 91
ef_search = 200
• Execution Time: 3.580 ms
• Buffers (shared hit): 5721
• Plan Cost: 303 → 305
Interpretation
Increasing ef_search from 40 to 200:
• Latency increased from 1.47 ms → 3.58 ms (~2.4×)
• Buffer hits increased from 1655 → 5721 (~3.5×)
This behavior is expected.
A higher ef_search means:
• More graph nodes are explored
• More candidate vectors are evaluated
• More memory pages are touched
In short:
• Higher recall → Higher CPU work → Higher buffer usage → Higher latency
The performance increase is gradual and predictable, which is one of HNSW's strengths.
IVFFlat: Effect of probes

probes = 1
• Execution Time: 1.344 ms
• Buffers (shared hit): 147
• Plan Cost: 80 → 82
probes = 20
• Execution Time: 6.989 ms
• Buffers (shared hit): 1769
• Plan Cost: 1564 → 1565
Interpretation
Increasing probes from 1 to 20:
• Latency increased from 1.34 ms → 6.99 ms (~5.2×)
• Buffer hits increased from 147 → 1769 (~12×)
This reflects IVFFlat’s structure:
• Each probe adds another inverted list to scan.
• More lists mean more disk/memory page reads.
• More candidates must be re-ranked.
Compared to HNSW, IVFFlat showed a steeper growth in buffer usage and execution time as search breadth increased.
Direct Comparison Under This Dataset
Under the same workload:
| Configuration | Execution Time |
| HNSW (ef=40) | 1.47 ms |
| IVFFlat (probes=1) | 1.34 ms |
| HNSW (ef=200) | 3.58 ms |
| IVFFlat (probes=20) | 6.99 ms |
Observations:
- At low search depth, both indexes perform similarly.
- As search breadth increases:
- HNSW scales more gradually.
- IVFFlat shows sharper cost growth.
- HNSW appears more stable when increasing recall requirements in this dataset.
However:
These results reflect latency only.
Proper evaluation requires measuring recall (accuracy) alongside execution time.
Final Takeaway
This test clearly demonstrates:
- Both indexes respond correctly to tuning parameters.
- Increasing search breadth increases buffer usage and latency.
- HNSW scales more smoothly under higher recall configurations.
- IVFFlat can be extremely fast at low probe counts but becomes significantly more expensive as probes increase.
To complete the evaluation, the next logical step is to measure:
- Recall@10
- Recall@K vs latency curves
- Optimal parameter combinations for your workload
In production systems, the right choice depends on:
- Data size
- Dimensionality
- Update frequency
- Acceptable recall level
- Memory budget
There is no universal winner — only workload-dependent trade-offs.
5.4 Practical Guidance
Choose HNSW when:
- High recall is important
- Memory usage is acceptable
- Query latency must remain predictable
- Dataset is frequently updated (no training phase)
Choose IVFFlat when:
- Memory footprint must be controlled
- Dataset is large but relatively static
- You are willing to tune lists and probes
- Slightly lower recall is acceptable for lower cost
Thanks for reading!
I’ll be back soon with another post—see you next time.
'PostgreSQL' 카테고리의 다른 글
| PostgreSQL: pg_vacman (0) | 2026.01.25 |
|---|---|
| PostgreSQL: DEFAULT PRIVILEGES (0) | 2025.12.31 |
| PostgreSQL: pg_mooncake (0) | 2025.12.24 |
| PostgreSQL: PgBouncer (0) | 2025.12.17 |
| PostgreSQL: Patroni Cluster 사용 시 파라미터 변경 방법 (0) | 2025.12.12 |