Official Resources

Key Features

  • HNSW ANN Index: Cosine/SSE distance with tunable efConstruction and efSearch parameters for optimized similarity search.
  • Multimodal Storage: Store text, images, audio bytes plus JSON metadata in a unified database.
  • Metadata Filtering: Advanced filtering with $eq, $ne, $in, $contains, $and, $or operators for precise queries.
  • Full-Text Search: BM25 full-text search via ft_query for hybrid search capabilities.
  • Flexible Embeddings: Defaults to all-MiniLM-L6-v2; plug in custom embedding functions via embedding_function parameter.
  • Persistence Options: PersistentClient(path='./db') for disk storage or ephemeral Client() for in-memory operations.
  • Multi-Language Support: Native clients for Python, JavaScript, Go, and C# with consistent APIs.

Code Examples

Python Installation & Setup

bash
pip install chromadb==1.0.12

Python Basic Usage

python
import chromadb
client = chromadb.PersistentClient("./chroma_db")
col = client.get_or_create_collection("docs")
col.add(ids=["doc1"], documents=["Cats are great"])
hits = col.query(query_texts=["felines"], n_results=1)
print(hits["documents"][0][0])  # Cats are great

JavaScript/TypeScript Setup

bash
npm install chromadb@latest

JavaScript/TypeScript Usage

typescript
import { ChromaClient } from 'chromadb';
const client = new ChromaClient({ path: 'http://localhost:8000' });
const col = await client.createCollection({ name: 'docs' });
await col.add({ ids: ['doc1'], documents: ['Cats are great'] });
const res = await col.query({ queryTexts: ['felines'], nResults: 1 });
console.log(res.documents[0][0]);  // Cats are great

LangChain Integration

python
# Install the bridge
# pip install langchain-chroma

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vector_store = Chroma(
    collection_name="my_collection",
    persist_directory="./db",
    embedding_function=OpenAIEmbeddings()
)
vector_store.add_texts(["Cats are great"])
docs = vector_store.similarity_search("felines", k=1)

Docker Deployment

bash
# Docker
docker run -p 8000:8000 chromadb/chroma:latest

# Docker Compose
docker-compose up

# Kubernetes
helm install chroma chromadb/chroma

# CLI
chroma run --path ./db

Use Cases

  • RAG prototypes - Zero-config setup in notebooks
  • Semantic search - Text + images + metadata queries
  • Local dev/test - No infrastructure, SQLite/DuckDB persistence
  • LangChain pipelines - QA, summarization, retrieval workflows
  • Production deployments - Single-node up to ~100M vectors / 250GB RAM

Pros & Cons

Advantages

  • Zero-config setup - Starts in-memory, persists to disk seamlessly
  • Multimodal support - Text, images, audio with unified API
  • Production-ready - Docker, Kubernetes, Helm chart deployment options
  • LangChain integration - Native langchain-chroma bridge available
  • Flexible persistence - SQLite, DuckDB + Parquet backends

Disadvantages

  • Single-node only - Horizontal scaling via external orchestration
  • Metadata size limit - 16KB per record, large blobs need external storage
  • No built-in RBAC - Community uses proxy solutions
  • Memory constraints - ~100M vectors / 250GB RAM single-node limit

Future Outlook & Integrations

  • Chroma Cloud Beta [H2 2025]: Managed service with autoscaling & RBAC capabilities
  • Distributed HNSW [Under Development]: Harmony prototype for horizontal scaling under RFC
  • Enhanced RBAC [Planned]: Built-in role-based access control for enterprise deployments
  • Multi-node Clustering [Future]: Native horizontal scaling without external orchestration