HypVector is a JavaScript library for storing and querying embedding vectors directly out of Apache Parquet files. It builds on hyparquet and hyparquet-writer so that a Parquet file on S3 (or local disk) acts as the vector database. Any client can run similarity search over HTTP range requests, without a server in between.
Part of HypStack, an open-source stack for AI observability.
- Works in browsers and node.js
- Self-describing files (dimension, metric, normalization, cluster centroids in Parquet KV metadata)
- Exact and approximate (binary + cluster + rerank) search out of the box
- Minimizes data fetching using HTTP range requests
- Includes TypeScript definitions
At 156k 384-dim wiki embeddings (249 MB), a single top-10 query reads ~6 MB across ~160 ranged HTTP fetches with ~91% recall against an exact full scan. Over a localhost HTTP server with 20 ms of injected per-request latency, the rerank path lands at ~140 ms/query vs ~360 ms for an exact full scan.
Vector search over 3,199,860 OpenAI embeddings (1024-dim) of real LLM conversations (WildChat-4.8M), top-10 recall against exact truth. Every competitor was queried over the network, the way it is actually deployed. hypvector keeps the vectors in object storage and runs the query in the client, so there is no server and no idle cost.
| Engine | Storage | Recall@10 | Warm query (p50) | All-in / mo | Server |
|---|---|---|---|---|---|
| hypvector | 13.7 GB | 0.925 | 147 ms | ~$0.32 | none |
| Pinecone | 13.1 GB | 0.920 | 85 ms | $50 min | managed |
| turbopuffer | 13.1 GB | 0.915 | 198 ms | $16 min | managed |
| S3 Vectors | 13.1 GB | 0.905 | 133 ms | ~$0.79 | serverless |
| pgvector | 41.9 GB | 0.870 | 80 ms | $372 | r5.2xlarge 24/7 |
| Qdrant | 13.1 GB | 0.865 | 70 ms | $186 | r5.xlarge 24/7 |
The managed and always-on engines keep the index hot to answer fast, which is what the monthly bill pays for. hypvector trades a little latency for zero idle cost and no infrastructure.
In the browser, pass a URL string as source and HypVector wraps it as a cached async buffer for ranged HTTP reads. Embed the query with the same model used at write time:
const { searchVectors } = await import('https://cdn.jsdelivr.net/npm/hypvector/src/index.js')
const results = await searchVectors({
source: 'https://example.com/vectors.parquet',
query: queryVec, // Float32Array of length `dimension`
topK: 10,
})
for (const { id, score } of results) {
console.log(score, id)
}To search a local Parquet file in a node.js environment, pass a file path:
import { searchVectors } from 'hypvector'
const results = await searchVectors({
source: 'vectors.parquet',
query: queryVec,
topK: 10,
})Note: hypvector is published as an ES module.
Create a Parquet file from any sync or async iterable of { id, vector }:
import { fileWriter } from 'hyparquet-writer'
import { writeVectors } from 'hypvector'
await writeVectors({
writer: fileWriter('vectors.parquet'),
dimension: 384,
// normalize defaults to true: L2-normalize on write, lets search skip sqrt for cosine.
// Pass normalize: false only if you need raw magnitudes (e.g. dot/euclidean on unnormalized vectors).
vectors: myEmbedder(), // any sync or async iterable of { id, vector }
})By default, writeVectors adds the binary sign-bit column and clusters rows automatically once the corpus crosses ~10k vectors. Below that, files are written as plain id + vector columns and search uses an exact full scan. To control these manually, pass binary: true/false and clusters: <n>; passing either disables the auto behavior for that knob. When the binary column is written, pageSize defaults to 32 KB so offset-index reads during search fetch tight ranges.
HypVector is BYO-embedding: you decide which model produces the vectors. It just stores { id, vector } pairs and queries them. The only contracts are:
- Same model on write and query. Embeddings from different models aren't comparable.
- Same
dimensionfor every record (must match thedimensionyou pass towriteVectors). normalizedefaults totrue, the right choice for any model whose vectors aren't already unit-length and you intend to query with cosine; it saves the per-candidate sqrt at query time. If your model already normalizes (most modern sentence-transformer models do), the default is harmless and records the flag in KV metadata. Passnormalize: falseonly when you want to preserve raw magnitudes fordot/euclidean.
The natural shape is an async generator that yields embedded records as you batch them through your embedder.
import { pipeline } from '@huggingface/transformers'
import { fileWriter } from 'hyparquet-writer'
import { writeVectors } from 'hypvector'
const extract = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2')
async function* embed(docs, batchSize = 32) {
for (let i = 0; i < docs.length; i += batchSize) {
const batch = docs.slice(i, i + batchSize)
const out = await extract(batch.map(d => d.text), { pooling: 'mean', normalize: true })
for (let j = 0; j < batch.length; j += 1) {
yield { id: batch[j].id, vector: out.data.slice(j * 384, (j + 1) * 384) }
}
}
}
await writeVectors({
writer: fileWriter('vectors.parquet'),
dimension: 384,
normalize: true,
vectors: embed(docs),
})async function* embed(docs, batchSize = 96) {
for (let i = 0; i < docs.length; i += batchSize) {
const batch = docs.slice(i, i + batchSize)
const res = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: { authorization: `Bearer ${process.env.OPENAI_API_KEY}`, 'content-type': 'application/json' },
body: JSON.stringify({ model: 'text-embedding-3-small', input: batch.map(d => d.text) }),
})
const { data } = await res.json()
for (let j = 0; j < batch.length; j += 1) {
yield { id: batch[j].id, vector: Float32Array.from(data[j].embedding) }
}
}
}See scripts/embed.js for a working version that streams 156k wiki rows through MiniLM and writes the result.
Stream every { id, vector } record back out for inspection or migration:
import { asyncBufferFromFile } from 'hyparquet'
import { readVectors } from 'hypvector'
const file = await asyncBufferFromFile('vectors.parquet')
for await (const { id, vector } of readVectors({ file })) {
console.log(id, vector.slice(0, 4))
}const results = await searchVectors({
source: 'https://example.com/vectors.parquet', // URL, local file path, or an open AsyncBuffer
query: queryVec, // Float32Array of length `dimension`
topK: 10,
algorithm: 'auto', // 'auto' | 'exact' | 'binary'
rerankFactor: 10, // candidate pool = topK * rerankFactor (default 10). Set to 0 to force exact full scan.
probe: 0.25, // fraction of clusters to scan in phase 1 (default 0.25). Set to 1 to scan all clusters; pass an integer > 1 for an absolute count.
})metric: 'cosine' | 'dot' | 'euclidean'overrides the metric stored in the file.sourceaccepts a URL string, a local file path, or an already-openedAsyncBuffer. When a string is passed, the default factory wraps the buffer incachedAsyncBufferso repeated reads of the footer / offset indexes are served from memory.- For repeated queries against the same file, open the
AsyncBufferand parsemetadataonce, then pass both:searchVectors({ source: file, metadata, query, ... }). This skips the per-query footer fetch and metadata parse.
Core columns: id (STRING), vector (FIXED_LEN_BYTE_ARRAY(4 × dim), raw float32 bytes, UNCOMPRESSED), and an optional ANN column: vector_bin (FIXED_LEN_BYTE_ARRAY(dim/8), 1 bit per dim) when binary: true.
Exact search path (no binary column, or rerankFactor: 0): single pass over the float32 column via parquetRead({ onChunk }). Each row-group's decoded Uint8Array[] shares a backing buffer, so we view it as one aligned Float32Array and stride by dim, with zero per-row allocations.
Binary + cluster + rerank path (default when binary: true):
- Build-time clustering (when
clusters > 0): k-means on the 1-bit codes using Hamming distance and bit-majority voting. Cluster ids are then renumbered via a greedy nearest-neighbor walk so that adjacent ids = similar centroids. This makes the top-N nearest clusters at query time tend to land in fewer contiguous row ranges. Rows are sorted by the new cluster id. Centroids and per-cluster row counts go into KV metadata. - Phase 1, cluster pruning: rank clusters by Hamming(query, centroid), pick the top
probefraction, and Hamming-scan only those clusters' row ranges. With 32 KB pages anduseOffsetIndex, hyparquet fetches only the pages covering each cluster's rows. - Phase 2, float32 rerank: collect the top
topK × rerankFactorcandidate row indices, coalesce them into contiguous runs (merging gaps ≤ 64 rows), and issue one rangedparquetReadper run for thevectorcolumn only. Score under the exact metric. - Phase 3, id lookup: fetch the
idcolumn for only the top-K winners (the id column is variable-length and reading it for every candidate doubles phase-2 cost).
A cachedAsyncBuffer deduplicates footer / offset-index byte ranges across all the parallel parquetRead calls.
For pre-normalized vectors with metric: 'cosine', the search normalizes the query once and scores via dot product to skip the per-candidate sqrt loop.
| Column | Type | Bytes per row | When written |
|---|---|---|---|
id |
STRING (UTF8) |
variable | always |
vector |
FIXED_LEN_BYTE_ARRAY(4 × dim) |
4 × dim |
always |
vector_bin |
FIXED_LEN_BYTE_ARRAY(dim/8) |
dim/8 |
when binary: true |
Key-value metadata:
| Key | Value |
|---|---|
hypvector.version |
format version (currently 0) |
hypvector.dimension |
length of each vector |
hypvector.metric |
cosine | dot | euclidean |
hypvector.normalized |
true if vectors were L2-normalized on write |
hypvector.binary |
true if the vector_bin column is present |
hypvector.clusters |
number of k-means clusters (0 if not clustered) |
hypvector.centroids |
base64-encoded centroid binary codes (clusters × dim/8 bytes); present when clusters > 0 |
hypvector.clusterCounts |
base64-encoded Uint32Array of per-cluster row counts; present when clusters > 0 |
npx hypvector vectors.parquetPrints format version, vector count, dimension, metric, whether a binary column is present, cluster count, and storage overhead.
The default rerankFactor of 10 is tuned for the hundreds-of-thousands range. As N grows, more binary candidates tie at the same Hamming distance and a wider phase-1 pool is needed to keep recall up. On a 1M synthetic dataset (256 true clusters, Gaussian noise):
rerankFactor |
candidates fetched | ms | recall@10 |
|---|---|---|---|
| 10 | 100 | 41 | 18% |
| 30 | 300 | 58 | 32% |
| 100 | 1,000 | 155 | 68% |
| 300 | 3,000 | 443 | 98% |
Rough rule: rerankFactor ≈ max(10, N / 3000). At 1M that's ~333, giving ~98% recall at ~440 ms, still about an order of magnitude faster than the 950 ms exact scan.
Vector search over 837,989 real LLM conversations (WildChat-1M), run against the same data on every engine. hypvector keeps the vectors in a Parquet file in object storage and runs the query in the client, so there is no server and no idle cost.
| Engine | Storage | Recall@10 | Query | All-in / mo | Server |
|---|---|---|---|---|---|
| hypvector | 3.58 GB | 0.975 | 46 ms † | ~$0.08 | none |
| pgvector | 11.5 GB | 0.965 | ~1 ms † | $94 | r5.large 24/7 |
| Qdrant | 3.6 GB | 0.965 | 2 ms † | $62 | t3.large 24/7 |
| turbopuffer | 3.43 GB | 0.93 | 60 ms ‡ | $16 min | managed |
| Pinecone | 3.43 GB | 0.97 | 125 ms ‡ | $50 min | managed |
† local compute, no network. ‡ live cloud, includes real internet round-trip.
The always-on engines win raw latency by keeping a hot index in RAM, which is what the monthly bill pays for. hypvector trades that for zero idle cost, a smaller footprint, and no infrastructure.
Measured on the 156k 384-dim wiki dataset, local file.
From scripts/ablation.js (write-side optimizations):
| Variant | File MB | Query ms | Fetches | MB read | Recall@10 |
|---|---|---|---|---|---|
base (vector + id), forced exact scan |
241.5 | 108 | 33 | 242.0 | 100% |
+ binary (phase 1 + 2 rerank) |
249.3 | 48 | 136 | 11.7 | 93% |
+ cluster (default; probe=0.25, clusters=128) |
249.4 | 15 | 162 | 6.2 | 91% |
From scripts/bench-http.js (localhost HTTP server with +20 ms per-request RTT, same 156k file):
| Search | ms/query |
|---|---|
| Exact full scan | 362 |
Rerank probe=0.5 |
152 |
Rerank probe=0.25 (default) |
139 |
Rerank probe=0.1 |
129 |
From scripts/ablation-search.js (same data, toggling search-side knobs):
| Search variant | Query ms | Fetches | MB read |
|---|---|---|---|
| baseline (all opts on) | 22 | 100 | 5.5 |
-coalesce (one parquetRead per candidate) |
34 | 133 | 4.9 |
-deferId (fetch ids alongside vectors) |
50 | 117 | 5.8 |
Trade query speed for recall via the probe knob (fraction of clusters scanned):
probe |
ms | fetches | MB | recall |
|---|---|---|---|---|
| 0.05 | 9 | 47 | 3.7 | 78% |
| 0.10 | 11 | 59 | 4.2 | 84% |
| 0.25 (default) | 16 | 79 | 5.2 | 91% |
| 0.50 | 21 | 94 | 6.5 | 94% |
| 1.00 (all clusters) | 29 | 70 | 8.3 | 94% |
hypvector isn't a hosted service. The closest peers are:
| Engine | Server? | Cold p50 | Warm p50 | Fixed $/mo |
|---|---|---|---|---|
| hypvector | none, file on S3 | ~500 ms (CloudFront, home WAN) | same, no cache | $0 |
| LanceDB (S3 mode) | none, embedded | bandwidth-bound | sub-50 ms (local) | $0 |
| turbopuffer | hosted | ~440 ms p90 | ~8 ms | $16 min |
| Pinecone Serverless | hosted | 200 ms – 2 s | 50–100 ms | $0 + per-RU |
| Cloudflare Vectorize | hosted (edge) | needs pre-warm | edge-fast | $0 + per-op |
Use hypvector for static datasets, browser-side search, or low-QPS where a hosted service's minimum spend dwarfs the actual cost. Reach for a hosted service when you need sub-10 ms warm latency at sustained QPS, frequent upserts, or filter-aware recall at scale.
- hyparquet: Parquet reading
- hyparquet-writer: Parquet writing
- hyparquet-compressors: Compression codecs
- Apache Parquet: Columnar storage format
Contributions are welcome! If you have suggestions, bug reports, or feature requests, please open an issue or submit a pull request.
