Pinecone is a category leader in managed vector databases. If you started building AI agents in 2024-2025, there is a very good chance the RAG layer in your stack is Pinecone — and the rest is glue.
This post is for teams considering the move from Pinecone + LangChain + custom retrieval glue to TheoRAG built into TheoKit. It is honest about what works today, what does not yet, and how to migrate gradually instead of rewriting in a weekend.
Where Pinecone wins
Let us start with the honest losses. Pinecone earned its category leadership.
- Managed scale. Pinecone runs production vector workloads at scale we have not matched. If your retrieval QPS is in the thousands per second, Pinecone is battle-tested for it. TheoRAG is anchored on Postgres + pgvector — a smart choice for most teams but a different scaling profile.
- Filter performance at scale. Pinecone metadata filtering is highly optimized for high-cardinality filters. pgvector + JSONB filters in TheoRAG are pragmatic but not in the same league at extreme scale.
- Maturity of indexing options. Pinecone has years of experience tuning HNSW, IVF, and various index parameters. TheoRAG ships HNSW cosine as the sane default and has not yet exposed the same depth of tuning options.
If your vector workload pushes those limits today, Pinecone is the safer pick. We are honest about that.
Where TheoRAG wins
The cases where TheoRAG makes sense are not the ones where you push pgvector to its limits. They are the cases where the cost of integrating five vendors is the actual bottleneck.
- One runtime, not five vendors. TheoRAG is part of TheoKit. The pipeline (loader → chunker → embedder → vectorstore → retriever → LLM → answer with citations) is one install (
pnpm add @usetheo/rag), one billing, one observability path. No coordination between Pinecone billing + LangSmith billing + Vercel deploy billing. - TypeScript end-to-end. The pipeline contract is Zod schemas. The retriever is typed. The answer comes back with
citations: ScoredChunk[]typed. Noanyescapes — you can refactor the chunker strategy without breaking the embedder caller. - No external SaaS in the critical path. TheoRAG runs against Postgres + pgvector. Local dev mode uses a Docker container; production uses your managed Postgres (or TheoCloud). You do not have to negotiate a separate vendor contract for your vector layer.
- DIP by default.
Loader,Chunker,Embedder,Retriever,LlmProviderare interfaces. Stub providers for CI; OpenAI for production. Swap embedder from OpenAI to Voyage to local without touching the pipeline orchestrator.
How to migrate gradually
You do not have to rewrite your agent in a weekend. The migration is incremental.
Step 1 — Set up TheoRAG alongside Pinecone
Install @usetheo/rag in your existing TheoKit (or any TypeScript) project. Configure it against a fresh Postgres + pgvector instance. Index your documents into both Pinecone and TheoRAG in parallel.
import { createAnswerPipeline } from '@usetheo/rag/pipeline'
import { createOpenAIEmbedder } from '@usetheo/rag/embedders'
import { createVectorRetriever } from '@usetheo/rag/retrievers'
// ...
Step 2 — Shadow-query in production
Run TheoRAG retrieval in parallel with Pinecone for the same agent queries. Log both result sets. Compare recall@5 and answer quality on a representative sample of real queries. This phase typically takes 1-2 weeks.
Step 3 — Cut over with a feature flag
Once shadow-query metrics are within tolerance, flip the feature flag to route agent retrieval through TheoRAG. Keep Pinecone running for 1-2 weeks as a rollback path.
Step 4 — Deprecate Pinecone
Drop the Pinecone client. Update billing. Re-baseline your observability dashboards.
Honest cost comparison
A typical agent with sustained traffic — say 100k retrieval queries per month, 10k document upserts per month, 10GB index size:
- Pinecone Starter ($70/mo) + LangSmith Plus ($39/mo) + Vercel Pro ($20/mo) = $129/mo before LLM costs.
- Theo Growth ($79/mo) covers the same workload with RAG + memory + observability + deploy in one bill.
We are not the cheapest at the extreme high end — Pinecone scales further per-dollar on dedicated capacity. We are honest about that.
When to pick which
Pick Pinecone if:
- You have vector workload that needs > 10k QPS sustained
- You need extreme filter performance on high-cardinality metadata
- You already have Pinecone integrated and the migration is not the bottleneck
Pick TheoRAG if:
- You want one runtime, one billing, one observability surface
- You are TypeScript-first
- You want to migrate to local-first dev mode (no external SaaS in CI)
- You are sick of stitching LangChain + Pinecone + Mem0 + LangSmith + Vercel
You can also pick both. TheoKit can call out to a Pinecone client from inside a TheoKit handler if you want to use TheoRAG for the bulk of retrieval and Pinecone for a specific high-scale use case.
Honest comparison at /compare/theo-vs-langchain. The full RAG pipeline is documented at docs.usetheo.dev and the repository is @usetheo/rag on GitHub.
If this post resonates, push back or build on the idea in Discord.
