ChromaDB

Website |GitHub |Documentation

An AI-native vector database built in Rust with Python, JavaScript, Go, and C# clients. Starts in-memory and can be persisted to disk via DuckDB + Parquet or SQLite, featuring HNSW ANN indexing and multimodal storage.

Official Resources

Cookbook |Issue Tracker |Harmony RFC |HoneyBee Paper

Key Features

HNSW ANN Index: Cosine/SSE distance with tunable efConstruction and efSearch parameters for optimized similarity search.
Multimodal Storage: Store text, images, audio bytes plus JSON metadata in a unified database.
Metadata Filtering: Advanced filtering with $eq, $ne, $in, $contains, $and, $or operators for precise queries.
Full-Text Search: BM25 full-text search via ft_query for hybrid search capabilities.
Flexible Embeddings: Defaults to all-MiniLM-L6-v2; plug in custom embedding functions via embedding_function parameter.
Persistence Options: PersistentClient(path='./db') for disk storage or ephemeral Client() for in-memory operations.
Multi-Language Support: Native clients for Python, JavaScript, Go, and C# with consistent APIs.

Code Examples

Python Installation & Setup

bash

uv add chromadb==1.0.12

Python Basic Usage

python

import chromadb
client = chromadb.PersistentClient("./chroma_db")
col = client.get_or_create_collection("docs")
col.add(ids=["doc1"], documents=["Cats are great"])
hits = col.query(query_texts=["felines"], n_results=1)
print(hits["documents"][0][0])  # Cats are great

JavaScript/TypeScript Setup

bash

pnpm add chromadb@latest

JavaScript/TypeScript Usage

typescript

import { ChromaClient } from 'chromadb';
const client = new ChromaClient({ path: 'http://localhost:8000' });
const col = await client.createCollection({ name: 'docs' });
await col.add({ ids: ['doc1'], documents: ['Cats are great'] });
const res = await col.query({ queryTexts: ['felines'], nResults: 1 });
console.log(res.documents[0][0]);  // Cats are great

LangChain Integration

python

# Install the bridge
# uv add langchain-chroma

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vector_store = Chroma(
    collection_name="my_collection",
    persist_directory="./db",
    embedding_function=OpenAIEmbeddings()
)
vector_store.add_texts(["Cats are great"])
docs = vector_store.similarity_search("felines", k=1)

Docker Deployment

bash

# Docker
docker run -p 8000:8000 chromadb/chroma:latest

# Docker Compose
docker-compose up

# Kubernetes
helm install chroma chromadb/chroma

# CLI
chroma run --path ./db

Use Cases

RAG prototypes - Zero-config setup in notebooks
Semantic search - Text + images + metadata queries
Local dev/test - No infrastructure, SQLite/DuckDB persistence
LangChain pipelines - QA, summarization, retrieval workflows
Production deployments - Single-node up to ~100M vectors / 250GB RAM

Pros & Cons

Advantages

Zero-config setup - Starts in-memory, persists to disk seamlessly
Multimodal support - Text, images, audio with unified API
Production-ready - Docker, Kubernetes, Helm chart deployment options
LangChain integration - Native langchain-chroma bridge available
Flexible persistence - SQLite, DuckDB + Parquet backends

Disadvantages

Single-node only - Horizontal scaling via external orchestration
Metadata size limit - 16KB per record, large blobs need external storage
No built-in RBAC - Community uses proxy solutions
Memory constraints - ~100M vectors / 250GB RAM single-node limit

Future Outlook & Integrations

Chroma Cloud Beta [H2 2025]: Managed service with autoscaling & RBAC capabilities
Distributed HNSW [Under Development]: Harmony prototype for horizontal scaling under RFC
Enhanced RBAC [Planned]: Built-in role-based access control for enterprise deployments
Multi-node Clustering [Future]: Native horizontal scaling without external orchestration