Skip to main content
CodeSenseiSearch
Semantic code search · portfolio reference build

Search code by meaning — not keywords

Ask in plain English. Hybrid retrieval combines Gemini embeddings over pgvector with Postgres full-text to return the closest matches from the indexed corpus.

Try:
Tip: press ⌘ K anywhere to focus search.
Retrieval
Vector + full-text
Hybrid reranked
Embeddings
Gemini 768-dim
text-embedding-001
Storage
Postgres + pgvector
Supabase-hosted
Runtime
NestJS on Vercel
BullMQ + Upstash

What actually ships

A working reference implementation of a semantic code search stack. Every card below maps to code in the repo.

Hybrid retrieval

Every query is dispatched to both a Gemini-powered vector search (cosine over pgvector) and a Postgres full-text search. Results are merged with a configurable weighted score (defaults: 0.6 vector / 0.4 text) and reranked.

Asymmetric embeddings

Query embeddings use Gemini's RETRIEVAL_QUERY task type; documents use RETRIEVAL_DOCUMENT. Asymmetric task types give measurably better recall than the same type on both sides.

pgvector storage

768-dim embeddings stored alongside content chunks in Postgres. Dimensions are pinned in the schema and the Gemini client — a mismatch silently truncates at insert time.

BullMQ ingestion

Discovery → ingestion → chunking → embedding modelled as a queue graph. Each stage is its own worker with its own concurrency and retry config; failed jobs park in a DLQ for replay.

AST-aware chunking

Code files split on syntactic boundaries via tree-sitter when available (functions, classes, methods); paragraph fallback otherwise. Chunks deduplicated on SHA-256.

Serverless by default

Web + API both deploy to Vercel functions. Workers run against Upstash Redis with TLS; /api/health reports DB + Redis + Gemini reachability for uptime probes.

What happens when you press Search

Three stages, end-to-end in a single request.

1

Embed the query

Gemini gemini-embedding-001 (RETRIEVAL_QUERY) returns a 768-dim vector.

2

Two searches in parallel

pgvector cosine similarity against stored chunks AND Postgres ts_rank full-text — awaited together.

3

Merge and rerank

Deduplicate by chunk id, blend with weighted score, return the top K to the browser.

Poke around, break it, read the code

Try the deployed search, skim the architecture notes in the docs, or clone the repo and run it locally — Docker Compose spins up Postgres and Redis in one command.

Portfolio reference build · no accounts, no email list, no billing — just a live deploy of the code in the repo.