JVector Advanced Usage
This section covers advanced usage patterns for production deployments.
On-Disk Index with Compression
For large datasets that exceed available memory, combine on-disk storage with Product Quantization (PQ) compression.
VectorIndexConfiguration config = VectorIndexConfiguration.builder()
.dimension(768)
.similarityFunction(VectorSimilarityFunction.COSINE)
.maxDegree(32)
.beamWidth(200)
// On-disk storage
.onDisk(true)
.indexDirectory(Path.of("/data/vectors"))
// PQ compression (reduces memory significantly)
.enablePqCompression(true)
.pqSubspaces(48) // Must divide dimension evenly
.build();
PQ compression automatically enforces maxDegree=32 due to the FusedPQ algorithm constraint.
|
Production Configuration
For production systems with continuous updates, enable both background persistence and optimization.
VectorIndexConfiguration config = VectorIndexConfiguration.builder()
.dimension(768)
.similarityFunction(VectorSimilarityFunction.COSINE)
// On-disk storage
.onDisk(true)
.indexDirectory(Path.of("/data/vectors"))
// Background persistence (async, non-blocking, enabled by setting interval > 0)
.persistenceIntervalMs(30_000) // Enable, check every 30 seconds
.minChangesBetweenPersists(100) // Only persist if >= 100 changes
.persistOnShutdown(true) // Persist on close()
// Background optimization (periodic cleanup, enabled by setting interval > 0)
.optimizationIntervalMs(60_000) // Enable, check every 60 seconds
.minChangesBetweenOptimizations(1000) // Only optimize if >= 1000 changes
.optimizeOnShutdown(false) // Skip for faster shutdown
.build();
Manual Optimization and Persistence
For fine-grained control, you can manually trigger optimization and persistence.
// Optimize graph (removes excess neighbors, improves query latency)
index.optimize();
// Persist to disk (for on-disk indices)
index.persistToDisk();
// Close index (runs shutdown hooks based on config)
index.close();
Multiple Vector Indices
You can create multiple vector indices for different embedding types on the same entity.
GigaMap<Document> gigaMap = GigaMap.New();
VectorIndices<Document> vectorIndices = gigaMap.index().register(VectorIndices.Category());
// Title embeddings (smaller dimension)
VectorIndexConfiguration titleConfig = VectorIndexConfiguration.builder()
.dimension(384)
.similarityFunction(VectorSimilarityFunction.COSINE)
.build();
VectorIndex<Document> titleIndex = vectorIndices.add("title", titleConfig, new TitleVectorizer());
// Content embeddings (larger dimension)
VectorIndexConfiguration contentConfig = VectorIndexConfiguration.builder()
.dimension(768)
.similarityFunction(VectorSimilarityFunction.COSINE)
.build();
VectorIndex<Document> contentIndex = vectorIndices.add("content", contentConfig, new ContentVectorizer());
Hybrid Search
Combine vector similarity search with traditional bitmap index filtering.
// First, filter by category using bitmap index
List<Long> categoryIds = bitmapIndex.query(
categoryIndexer.is("technology")
).toList();
// Then search within filtered results
VectorSearchResult<Document> result = vectorIndex.search(queryVector, 10);
// Combine results
List<Document> hybridResults = result.stream()
.filter(e -> categoryIds.contains(e.entityId()))
.map(VectorSearchResult.Entry::entity)
.toList();
Vectorizer Implementations
Embedded Vectors
When the vector is stored directly in the entity, set isEmbedded() to true to avoid duplicate storage.
public class DocumentVectorizer extends Vectorizer<Document>
{
@Override
public float[] vectorize(Document entity)
{
return entity.embedding();
}
@Override
public boolean isEmbedded()
{
return true;
}
}
Computed Vectors
When vectors are computed on-the-fly or fetched from an external service, set isEmbedded() to false.
public class TextVectorizer extends Vectorizer<Document>
{
private final EmbeddingService embeddingService;
public TextVectorizer(EmbeddingService embeddingService)
{
this.embeddingService = embeddingService;
}
@Override
public float[] vectorize(Document entity)
{
return embeddingService.embed(entity.text());
}
@Override
public boolean isEmbedded()
{
return false; // Vector will be stored separately
}
}
| When using computed vectors with an external service, be aware of potential latency during indexing operations. Consider pre-computing embeddings and storing them in the entity for better performance. |
The vectorize() method must never return null. If it does, an IllegalStateException is thrown when the entity is added, updated, or re-indexed. If some entities cannot produce a vector, they should be excluded before adding them to the GigaMap, or the vectorizer must provide a fallback vector.
|
Benchmarking
The library includes benchmark tests following ANN-Benchmarks methodology.
# Run benchmark tests (disabled by default)
mvn test -Dtest=VectorIndexBenchmarkTest \
-Djunit.jupiter.conditions.deactivate=org.junit.*DisabledCondition