Official Resources
Key Features
- Rust Core with HNSW: Built in Rust with HNSW graph indexing and payload-aware inverted index, RocksDB persistence for high performance.
- GPU Acceleration: Optional GPU indexing with CUDA 12, ROCm 5.7, oneAPI 2024 support - 10× faster index builds with gpu_indexing=true flag.
- Advanced Quantization: Scalar (int8), Product (PQ128), Binary quantization switchable at runtime - saves 75-97% RAM vs float32.
- Hybrid Search: SPLADEv2 sparse vectors with hybrid dense+sparse scoring for comprehensive search capabilities.
- Enterprise Security: TLS 1.3, disk AES-256 encryption, Cloud RBAC with scoped API keys (TTL ≤90 days), SSO (SAML 2.0, OIDC).
- Observability: Prometheus metrics (qdrant_collection_size_bytes, qdrant_gpu_index_time), Grafana dashboard JSON included.
- Multi-Language SDKs: Python, Node.js, Go, Java, .NET, Rust clients all updated to v1.13.x with async and GPU support.
Code Examples
Local Docker with GPU
bash
docker run -d --name qdrant \
-p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
--gpus all \
qdrant/qdrant:v1.13.1
Kubernetes with GPU
bash
helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm install qdrant qdrant/qdrant \
--set gpuIndexing.enabled=true \
--set resources.limits.nvidia.com/gpu=1
Python GPU Index Build
python
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
client.create_collection(
collection_name="arxiv",
vectors_config={"size": 768, "distance": "Cosine"},
optimizers_config={"indexing_threshold": 0}, # GPU index
hnsw_config={"m": 32, "ef_construct": 256, "gpu_indexing": True}
)
SDK Matrix Overview
text
# Multi-language SDK support (all v1.13.x):
# Python: qdrant-client (async + GPU)
# Node.js: @qdrant/js-client-rest
# Go: github.com/qdrant/go-client
# Java: io.qdrant:client
# .NET: Qdrant.Client
# Rust: qdrant-client (crate)
# Architecture limits:
# - Single-node: ~2B vectors / 8k dims / 1TB RAM per node
# - Security: TLS 1.3, disk AES-256, Cloud RBAC
# - Quantization: Scalar (int8), Product (PQ128), Binary
Use Cases
- RAG systems - LangChain (Qdrant.from_documents), Vertex AI RAG Engine, Haystack QdrantDocumentStore
- Recommendation systems - Hybrid dense+sparse vectors with payload filtering by user_id, category
- Real-time applications - Hot/cold tiering via collection snapshots + S3 restore
- Enterprise search - Scoped API keys per tenant, SSO enforced via Cloud IAM
- High-performance search - 20k QPS single-node with Rust + HNSW optimization
Pros & Cons
Advantages
- Rust + HNSW performance - 20k QPS single-node capability
- GPU indexing acceleration - Cuts ef_construct=256 from 45 min to 4 min
- Quantization efficiency - Saves 75-97% RAM vs float32
- Enterprise features - RBAC, SSO, audit logs in Cloud
- OpenAPI spec - Auto-generates clients for multiple languages
Disadvantages
- Single-node limitation - No native sharding (app-level routing workaround)
- GPU complexity - Requires GPU drivers & memory tuning
- Metadata constraints - Soft-limit ~32kB per field
- Cloud free tier limits - 2k vectors, 1GB RAM only
- Hybrid query complexity - Needs payload schema discipline
Future Outlook & Integrations
- Distributed Mode [Engineering Preview]: Raft consensus and shard replication (branch dev-distributed)
- 4-bit Quantization [Target v1.15]: int4 PQ for 2× memory improvement targeting v1.15
- Hybrid-Cloud BYOC [Beta Q4]: Deploy managed control-plane inside customer VPC
- Cross-Collection Joins [Next 6 Months]: Unified sparse+dense scoring across namespaces