JVector Index
The JVector index adds vector similarity search capabilities to entities stored in a GigaMap. Like the Bitmap and Lucene indices, it is registered with the GigaMap and automatically kept in sync as entities are added, updated, or removed.
Under the hood, the index uses JVector, a high-performance HNSW (Hierarchical Navigable Small World) graph implementation. This enables fast approximate k-nearest-neighbor (k-NN) search on vector embeddings, making it ideal for AI/ML applications like semantic search, recommendation systems, and RAG (Retrieval-Augmented Generation).
What is Vector Search?
Traditional database queries find exact matches: "find all customers where city = 'Berlin'" or "find products where price < 100". Vector search is different - it finds similar items based on meaning or characteristics, even when there’s no exact match.
How it Works
-
Embeddings: Each entity (product, customer, document) is converted into a vector - an array of numbers that represents its characteristics. These vectors are typically generated by AI/ML models that understand the semantic meaning of text, images, or behavior patterns.
-
Similarity Search: Instead of matching exact values, vector search finds the entities whose vectors are closest to a query vector. "Closest" is determined by a similarity function (cosine, dot product, or euclidean distance).
-
Approximate Nearest Neighbors: For large datasets, exact similarity search is too slow. The JVector index uses an HNSW graph structure to find approximate nearest neighbors in milliseconds, even with millions of vectors.
When to Use Vector Search
| Query Type | Index Type | Example |
|---|---|---|
Exact match |
Bitmap |
|
Full-text search |
Lucene |
|
Similarity search |
JVector |
"Find products similar to this one" |
Use the JVector index when you need to:
-
Find similar products, customers, or content
-
Match questions to FAQ entries or support tickets
-
Search by meaning rather than keywords
-
Build recommendation systems
-
Implement semantic search with AI embeddings
Integration with GigaMap
The JVector index integrates seamlessly with GigaMap’s indexing system. When you add, update, or remove entities from the GigaMap, the vector index is automatically updated. Search results return lazy references to your entities, so you can access the full object without additional lookups.
You can combine vector search with bitmap indices for hybrid queries - for example, finding similar products within a specific category or price range.
Features
-
HNSW Vector Index: Fast approximate k-nearest-neighbor search using JVector’s HNSW graph implementation
-
Persistent Storage: Vectors are stored in GigaMap for durability and lazy loading
-
On-Disk Index: Memory-mapped graph storage for datasets larger than RAM
-
PQ Compression: Product Quantization for reduced memory footprint
-
Background Persistence: Automatic asynchronous persistence at configurable intervals
-
Background Optimization: Periodic graph cleanup for improved query performance
-
Lazy Entity Access: Search results provide direct access to entities without additional lookups
-
Stream API: Java Stream support for search results
SIMD Acceleration (Panama Vector API)
JVector leverages the Panama Vector API (jdk.incubator.vector) for hardware-accelerated vector operations. This provides significant performance improvements for ANN indexing and search through SIMD (Single Instruction, Multiple Data) instructions.
Benefits
-
Faster distance calculations: SIMD parallelizes vector arithmetic across CPU vector registers
-
Accelerated indexing: Graph construction benefits from parallel similarity computations
-
Faster queries: Nearest neighbor searches execute more quickly
-
Optimized PQ encoding: Product Quantization compression uses SIMD for distance computations
Java Version Requirements
| Java Version | SIMD Support |
|---|---|
Java 17-19 |
Functional, but not optimized (scalar fallback) |
Java 20+ |
Full SIMD acceleration via Panama Vector API |
JVector uses a multi-release JAR structure:
-
Base code targets Java 11 compatibility
-
Optimized vector code in
jvector-twentyactivates automatically on Java 20+ JVMs -
Earlier Java versions receive functional but non-SIMD implementations
JVM Parameters
To enable the Panama Vector API, add the incubator module to your JVM arguments:
java --add-modules jdk.incubator.vector -jar your-app.jar
See JVM Configuration for detailed setup instructions including Maven and Gradle configuration.
| For optimal performance, run on Java 21 LTS or later to benefit from full SIMD acceleration. The performance difference can be substantial for large-scale vector operations. |
Installation
<dependency>
<groupId>org.eclipse.store</groupId>
<artifactId>gigamap-jvector</artifactId>
<version>4.0.0-beta1</version>
</dependency>
Example
First, we need to implement a Vectorizer, which extracts the vector embedding from entities.
public class DocumentVectorizer extends Vectorizer<Document>
{
@Override
public float[] vectorize(Document entity)
{
return entity.embedding();
}
@Override
public boolean isEmbedded()
{
return true; // Vector is stored in entity (no duplicate storage)
}
}
Then we create a VectorIndex and register it at the GigaMap.
// Create GigaMap and register vector indices
GigaMap<Document> gigaMap = GigaMap.New();
VectorIndices<Document> vectorIndices = gigaMap.index().register(VectorIndices.Category());
// Configure the vector index
VectorIndexConfiguration config = VectorIndexConfiguration.builder()
.dimension(768)
.similarityFunction(VectorSimilarityFunction.COSINE)
.build();
// Add the index with a name, configuration, and vectorizer
VectorIndex<Document> index = vectorIndices.add("embeddings", config, new DocumentVectorizer());
After adding entities to the GigaMap, we can search for similar vectors.
// Add entities (automatically indexed)
gigaMap.add(new Document("Hello world", embedding));
// Search for similar vectors (returns top 10 results)
VectorSearchResult<Document> result = index.search(queryVector, 10);
for (VectorSearchResult.Entry<Document> entry : result)
{
Document doc = entry.entity(); // Lazy entity access
float score = entry.score(); // Similarity score
long id = entry.entityId(); // Entity ID
}
The search results support the Java Stream API for convenient filtering and transformation.
List<Document> topDocs = result.stream()
.filter(e -> e.score() > 0.8f)
.map(VectorSearchResult.Entry::entity)
.toList();
Similarity Functions
The following similarity functions are available:
| Function | Description |
|---|---|
|
Cosine similarity, normalized for direction. Best for text embeddings. |
|
Dot product similarity. Use when vectors are already normalized. |
|
Euclidean distance. Best for geometric or spatial data. |
Persistence with EclipseStore
Binary type handlers are registered automatically when using EclipseStore.
try (EmbeddedStorageManager storage = EmbeddedStorage.start(storageDir))
{
GigaMap<Document> gigaMap = GigaMap.New();
storage.setRoot(gigaMap);
VectorIndices<Document> vectorIndices = gigaMap.index().register(VectorIndices.Category());
VectorIndex<Document> index = vectorIndices.add("embeddings", config, new DocumentVectorizer());
gigaMap.add(new Document("text", embedding));
storage.storeRoot();
}
Limitations
-
Null vectors are not accepted: The
Vectorizer.vectorize()method must never returnnull. If it does, anIllegalStateExceptionis thrown. Ensure that every entity added to the GigaMap can produce a valid vector. -
~2.1 billion vectors per index: JVector uses
intfor graph node ordinals. For larger datasets, implement sharding across multiple indices. -
PQ compression requires maxDegree=32: FusedPQ algorithm constraint (auto-enforced).