Official Resources

Key Features

  • Rust Core with HNSW: Built in Rust with HNSW graph indexing and payload-aware inverted index, RocksDB persistence for high performance.
  • GPU Acceleration: Optional GPU indexing with CUDA 12, ROCm 5.7, oneAPI 2024 support - 10× faster index builds with gpu_indexing=true flag.
  • Advanced Quantization: Scalar (int8), Product (PQ128), Binary quantization switchable at runtime - saves 75-97% RAM vs float32.
  • Hybrid Search: SPLADEv2 sparse vectors with hybrid dense+sparse scoring for comprehensive search capabilities.
  • Enterprise Security: TLS 1.3, disk AES-256 encryption, Cloud RBAC with scoped API keys (TTL ≤90 days), SSO (SAML 2.0, OIDC).
  • Observability: Prometheus metrics (qdrant_collection_size_bytes, qdrant_gpu_index_time), Grafana dashboard JSON included.
  • Multi-Language SDKs: Python, Node.js, Go, Java, .NET, Rust clients all updated to v1.13.x with async and GPU support.

Code Examples

Local Docker with GPU

bash
docker run -d --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  --gpus all \
  qdrant/qdrant:v1.13.1

Kubernetes with GPU

bash
helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm install qdrant qdrant/qdrant \
  --set gpuIndexing.enabled=true \
  --set resources.limits.nvidia.com/gpu=1

Python GPU Index Build

python
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)

client.create_collection(
    collection_name="arxiv",
    vectors_config={"size": 768, "distance": "Cosine"},
    optimizers_config={"indexing_threshold": 0},   # GPU index
    hnsw_config={"m": 32, "ef_construct": 256, "gpu_indexing": True}
)

SDK Matrix Overview

text
# Multi-language SDK support (all v1.13.x):
# Python: qdrant-client (async + GPU)
# Node.js: @qdrant/js-client-rest
# Go: github.com/qdrant/go-client
# Java: io.qdrant:client
# .NET: Qdrant.Client
# Rust: qdrant-client (crate)

# Architecture limits:
# - Single-node: ~2B vectors / 8k dims / 1TB RAM per node
# - Security: TLS 1.3, disk AES-256, Cloud RBAC
# - Quantization: Scalar (int8), Product (PQ128), Binary

Use Cases

  • RAG systems - LangChain (Qdrant.from_documents), Vertex AI RAG Engine, Haystack QdrantDocumentStore
  • Recommendation systems - Hybrid dense+sparse vectors with payload filtering by user_id, category
  • Real-time applications - Hot/cold tiering via collection snapshots + S3 restore
  • Enterprise search - Scoped API keys per tenant, SSO enforced via Cloud IAM
  • High-performance search - 20k QPS single-node with Rust + HNSW optimization

Pros & Cons

Advantages

  • Rust + HNSW performance - 20k QPS single-node capability
  • GPU indexing acceleration - Cuts ef_construct=256 from 45 min to 4 min
  • Quantization efficiency - Saves 75-97% RAM vs float32
  • Enterprise features - RBAC, SSO, audit logs in Cloud
  • OpenAPI spec - Auto-generates clients for multiple languages

Disadvantages

  • Single-node limitation - No native sharding (app-level routing workaround)
  • GPU complexity - Requires GPU drivers & memory tuning
  • Metadata constraints - Soft-limit ~32kB per field
  • Cloud free tier limits - 2k vectors, 1GB RAM only
  • Hybrid query complexity - Needs payload schema discipline

Future Outlook & Integrations

  • Distributed Mode [Engineering Preview]: Raft consensus and shard replication (branch dev-distributed)
  • 4-bit Quantization [Target v1.15]: int4 PQ for 2× memory improvement targeting v1.15
  • Hybrid-Cloud BYOC [Beta Q4]: Deploy managed control-plane inside customer VPC
  • Cross-Collection Joins [Next 6 Months]: Unified sparse+dense scoring across namespaces