Qdrant

Website | GitHub

An open-source, AI-native vector database built in Rust with HNSW indexing, GPU acceleration, quantization support, and enterprise features. Supports single-node deployments up to ~2B vectors with hybrid dense+sparse search capabilities.

Official Resources

Changelog Release Notes OpenAPI Spec Helm Chart Cloud Console Web UI Benchmark Cloud Blog Roadmap

Key Features

Rust Core with HNSW: Built in Rust with HNSW graph indexing and payload-aware inverted index, RocksDB persistence for high performance.
GPU Acceleration: Optional GPU indexing with CUDA 12, ROCm 5.7, oneAPI 2024 support - 10× faster index builds with gpu_indexing=true flag.
Advanced Quantization: Scalar (int8), Product (PQ128), Binary quantization switchable at runtime - saves 75-97% RAM vs float32.
Hybrid Search: SPLADEv2 sparse vectors with hybrid dense+sparse scoring for comprehensive search capabilities.
Enterprise Security: TLS 1.3, disk AES-256 encryption, Cloud RBAC with scoped API keys (TTL ≤90 days), SSO (SAML 2.0, OIDC).
Observability: Prometheus metrics (qdrant_collection_size_bytes, qdrant_gpu_index_time), Grafana dashboard JSON included.
Multi-Language SDKs: Python, Node.js, Go, Java, .NET, Rust clients all updated to v1.13.x with async and GPU support.

Code Examples

Local Docker with GPU

bash

docker run -d --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  --gpus all \
  qdrant/qdrant:v1.13.1

Kubernetes with GPU

bash

helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm install qdrant qdrant/qdrant \
  --set gpuIndexing.enabled=true \
  --set resources.limits.nvidia.com/gpu=1

Python GPU Index Build

python

from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)

client.create_collection(
    collection_name="arxiv",
    vectors_config={"size": 768, "distance": "Cosine"},
    optimizers_config={"indexing_threshold": 0},   # GPU index
    hnsw_config={"m": 32, "ef_construct": 256, "gpu_indexing": True}
)

SDK Matrix Overview

text

# Multi-language SDK support (all v1.13.x):
# Python: qdrant-client (async + GPU)
# Node.js: @qdrant/js-client-rest
# Go: github.com/qdrant/go-client
# Java: io.qdrant:client
# .NET: Qdrant.Client
# Rust: qdrant-client (crate)

# Architecture limits:
# - Single-node: ~2B vectors / 8k dims / 1TB RAM per node
# - Security: TLS 1.3, disk AES-256, Cloud RBAC
# - Quantization: Scalar (int8), Product (PQ128), Binary

Use Cases

RAG systems - LangChain (Qdrant.from_documents), Vertex AI RAG Engine, Haystack QdrantDocumentStore
Recommendation systems - Hybrid dense+sparse vectors with payload filtering by user_id, category
Real-time applications - Hot/cold tiering via collection snapshots + S3 restore
Enterprise search - Scoped API keys per tenant, SSO enforced via Cloud IAM
High-performance search - 20k QPS single-node with Rust + HNSW optimization

Pros & Cons

Advantages

Rust + HNSW performance - 20k QPS single-node capability
GPU indexing acceleration - Cuts ef_construct=256 from 45 min to 4 min
Quantization efficiency - Saves 75-97% RAM vs float32
Enterprise features - RBAC, SSO, audit logs in Cloud
OpenAPI spec - Auto-generates clients for multiple languages

Disadvantages

Single-node limitation - No native sharding (app-level routing workaround)
GPU complexity - Requires GPU drivers & memory tuning
Metadata constraints - Soft-limit ~32kB per field
Cloud free tier limits - 2k vectors, 1GB RAM only
Hybrid query complexity - Needs payload schema discipline

Future Outlook & Integrations

Distributed Mode [Engineering Preview]: Raft consensus and shard replication (branch dev-distributed)
4-bit Quantization [Target v1.15]: int4 PQ for 2× memory improvement targeting v1.15
Hybrid-Cloud BYOC [Beta Q4]: Deploy managed control-plane inside customer VPC
Cross-Collection Joins [Next 6 Months]: Unified sparse+dense scoring across namespaces