Official Resources
Key Features
- HNSW ANN Index: Cosine/SSE distance with tunable efConstruction and efSearch parameters for optimized similarity search.
- Multimodal Storage: Store text, images, audio bytes plus JSON metadata in a unified database.
- Metadata Filtering: Advanced filtering with $eq, $ne, $in, $contains, $and, $or operators for precise queries.
- Full-Text Search: BM25 full-text search via ft_query for hybrid search capabilities.
- Flexible Embeddings: Defaults to all-MiniLM-L6-v2; plug in custom embedding functions via embedding_function parameter.
- Persistence Options: PersistentClient(path='./db') for disk storage or ephemeral Client() for in-memory operations.
- Multi-Language Support: Native clients for Python, JavaScript, Go, and C# with consistent APIs.
Code Examples
Python Installation & Setup
bash
pip install chromadb==1.0.12
Python Basic Usage
python
import chromadb
client = chromadb.PersistentClient("./chroma_db")
col = client.get_or_create_collection("docs")
col.add(ids=["doc1"], documents=["Cats are great"])
hits = col.query(query_texts=["felines"], n_results=1)
print(hits["documents"][0][0]) # Cats are great
JavaScript/TypeScript Setup
bash
npm install chromadb@latest
JavaScript/TypeScript Usage
typescript
import { ChromaClient } from 'chromadb';
const client = new ChromaClient({ path: 'http://localhost:8000' });
const col = await client.createCollection({ name: 'docs' });
await col.add({ ids: ['doc1'], documents: ['Cats are great'] });
const res = await col.query({ queryTexts: ['felines'], nResults: 1 });
console.log(res.documents[0][0]); // Cats are great
LangChain Integration
python
# Install the bridge
# pip install langchain-chroma
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
vector_store = Chroma(
collection_name="my_collection",
persist_directory="./db",
embedding_function=OpenAIEmbeddings()
)
vector_store.add_texts(["Cats are great"])
docs = vector_store.similarity_search("felines", k=1)
Docker Deployment
bash
# Docker
docker run -p 8000:8000 chromadb/chroma:latest
# Docker Compose
docker-compose up
# Kubernetes
helm install chroma chromadb/chroma
# CLI
chroma run --path ./db
Use Cases
- RAG prototypes - Zero-config setup in notebooks
- Semantic search - Text + images + metadata queries
- Local dev/test - No infrastructure, SQLite/DuckDB persistence
- LangChain pipelines - QA, summarization, retrieval workflows
- Production deployments - Single-node up to ~100M vectors / 250GB RAM
Pros & Cons
Advantages
- Zero-config setup - Starts in-memory, persists to disk seamlessly
- Multimodal support - Text, images, audio with unified API
- Production-ready - Docker, Kubernetes, Helm chart deployment options
- LangChain integration - Native langchain-chroma bridge available
- Flexible persistence - SQLite, DuckDB + Parquet backends
Disadvantages
- Single-node only - Horizontal scaling via external orchestration
- Metadata size limit - 16KB per record, large blobs need external storage
- No built-in RBAC - Community uses proxy solutions
- Memory constraints - ~100M vectors / 250GB RAM single-node limit
Future Outlook & Integrations
- Chroma Cloud Beta [H2 2025]: Managed service with autoscaling & RBAC capabilities
- Distributed HNSW [Under Development]: Harmony prototype for horizontal scaling under RFC
- Enhanced RBAC [Planned]: Built-in role-based access control for enterprise deployments
- Multi-node Clustering [Future]: Native horizontal scaling without external orchestration