Search code by meaning — not keywords
Ask in plain English. Hybrid retrieval combines Gemini embeddings over pgvector with Postgres full-text to return the closest matches from the indexed corpus.
What actually ships
A working reference implementation of a semantic code search stack. Every card below maps to code in the repo.
Hybrid retrieval
Every query is dispatched to both a Gemini-powered vector search (cosine over pgvector) and a Postgres full-text search. Results are merged with a configurable weighted score (defaults: 0.6 vector / 0.4 text) and reranked.
Asymmetric embeddings
Query embeddings use Gemini's RETRIEVAL_QUERY task type; documents use RETRIEVAL_DOCUMENT. Asymmetric task types give measurably better recall than the same type on both sides.
pgvector storage
768-dim embeddings stored alongside content chunks in Postgres. Dimensions are pinned in the schema and the Gemini client — a mismatch silently truncates at insert time.
BullMQ ingestion
Discovery → ingestion → chunking → embedding modelled as a queue graph. Each stage is its own worker with its own concurrency and retry config; failed jobs park in a DLQ for replay.
AST-aware chunking
Code files split on syntactic boundaries via tree-sitter when available (functions, classes, methods); paragraph fallback otherwise. Chunks deduplicated on SHA-256.
Serverless by default
Web + API both deploy to Vercel functions. Workers run against Upstash Redis with TLS; /api/health reports DB + Redis + Gemini reachability for uptime probes.
What happens when you press Search
Three stages, end-to-end in a single request.
Embed the query
Gemini gemini-embedding-001 (RETRIEVAL_QUERY) returns a 768-dim vector.
Two searches in parallel
pgvector cosine similarity against stored chunks AND Postgres ts_rank full-text — awaited together.
Merge and rerank
Deduplicate by chunk id, blend with weighted score, return the top K to the browser.
Poke around, break it, read the code
Try the deployed search, skim the architecture notes in the docs, or clone the repo and run it locally — Docker Compose spins up Postgres and Redis in one command.
Portfolio reference build · no accounts, no email list, no billing — just a live deploy of the code in the repo.