JVector Index

The JVector index adds vector similarity search capabilities to entities stored in a GigaMap. Like the Bitmap and Lucene indices, it is registered with the GigaMap and automatically kept in sync as entities are added, updated, or removed.

Under the hood, the index uses JVector, a high-performance HNSW (Hierarchical Navigable Small World) graph implementation. This enables fast approximate k-nearest-neighbor (k-NN) search on vector embeddings, making it ideal for AI/ML applications like semantic search, recommendation systems, and RAG (Retrieval-Augmented Generation).

What is Vector Search?

Traditional database queries find exact matches: "find all customers where city = 'Berlin'" or "find products where price < 100". Vector search is different - it finds similar items based on meaning or characteristics, even when there’s no exact match.

How it Works

Embeddings: Each entity (product, customer, document) is converted into a vector - an array of numbers that represents its characteristics. These vectors are typically generated by AI/ML models that understand the semantic meaning of text, images, or behavior patterns.
Similarity Search: Instead of matching exact values, vector search finds the entities whose vectors are closest to a query vector. "Closest" is determined by a similarity function (cosine, dot product, or euclidean distance).
Approximate Nearest Neighbors: For large datasets, exact similarity search is too slow. The JVector index uses an HNSW graph structure to find approximate nearest neighbors in milliseconds, even with millions of vectors.

When to Use Vector Search

Query Type Index Type Example

Query Type	Index Type	Example
Exact match	Bitmap	`category = "Electronics"`
Full-text search	Lucene	`description contains "wireless bluetooth"`
Similarity search	JVector	"Find products similar to this one"

Exact match

Bitmap

category = "Electronics"

Full-text search

Lucene

description contains "wireless bluetooth"

Similarity search

JVector

"Find products similar to this one"

Use the JVector index when you need to:

Find similar products, customers, or content
Match questions to FAQ entries or support tickets
Search by meaning rather than keywords
Build recommendation systems
Implement semantic search with AI embeddings

Integration with GigaMap

The JVector index integrates seamlessly with GigaMap’s indexing system. When you add, update, or remove entities from the GigaMap, the vector index is automatically updated. Search results return lazy references to your entities, so you can access the full object without additional lookups.

You can combine vector search with bitmap indices for hybrid queries - for example, finding similar products within a specific category or price range.

Features

HNSW Vector Index: Fast approximate k-nearest-neighbor search using JVector’s HNSW graph implementation
Persistent Storage: Vectors are stored in GigaMap for durability and lazy loading
On-Disk Index: Memory-mapped graph storage for datasets larger than RAM
PQ Compression: Product Quantization for reduced memory footprint
Background Persistence: Automatic asynchronous persistence at configurable intervals
Background Optimization: Periodic graph cleanup for improved query performance
Lazy Entity Access: Search results provide direct access to entities without additional lookups
Stream API: Java Stream support for search results

Requirements

Java 17+ (minimum)
Java 20+ (recommended for SIMD acceleration)

SIMD Acceleration (Panama Vector API)

JVector leverages the Panama Vector API (jdk.incubator.vector) for hardware-accelerated vector operations. This provides significant performance improvements for ANN indexing and search through SIMD (Single Instruction, Multiple Data) instructions.

Benefits

Faster distance calculations: SIMD parallelizes vector arithmetic across CPU vector registers
Accelerated indexing: Graph construction benefits from parallel similarity computations
Faster queries: Nearest neighbor searches execute more quickly
Optimized PQ encoding: Product Quantization compression uses SIMD for distance computations

Java Version Requirements

Java Version	SIMD Support
Java 17-19	Functional, but not optimized (scalar fallback)
Java 20+	Full SIMD acceleration via Panama Vector API

Java Version

SIMD Support

Java 17-19

Functional, but not optimized (scalar fallback)

Java 20+

Full SIMD acceleration via Panama Vector API

JVector uses a multi-release JAR structure:

Base code targets Java 11 compatibility
Optimized vector code in jvector-twenty activates automatically on Java 20+ JVMs
Earlier Java versions receive functional but non-SIMD implementations

JVM Parameters

To enable the Panama Vector API, add the incubator module to your JVM arguments:

java --add-modules jdk.incubator.vector -jar your-app.jar

See JVM Configuration for detailed setup instructions including Maven and Gradle configuration.

For optimal performance, run on Java 21 LTS or later to benefit from full SIMD acceleration. The performance difference can be substantial for large-scale vector operations.

Installation

Maven [pom.xml]

<dependency>
    <groupId>org.eclipse.store</groupId>
    <artifactId>gigamap-jvector</artifactId>
    <version>4.0.0-beta1</version>
</dependency>

Example

First, we need to implement a Vectorizer, which extracts the vector embedding from entities.

public class DocumentVectorizer extends Vectorizer<Document>
{
    @Override
    public float[] vectorize(Document entity)
    {
        return entity.embedding();
    }

    @Override
    public boolean isEmbedded()
    {
        return true; // Vector is stored in entity (no duplicate storage)
    }
}

Then we create a VectorIndex and register it at the GigaMap.

// Create GigaMap and register vector indices
GigaMap<Document> gigaMap = GigaMap.New();
VectorIndices<Document> vectorIndices = gigaMap.index().register(VectorIndices.Category());

// Configure the vector index
VectorIndexConfiguration config = VectorIndexConfiguration.builder()
    .dimension(768)
    .similarityFunction(VectorSimilarityFunction.COSINE)
    .build();

// Add the index with a name, configuration, and vectorizer
VectorIndex<Document> index = vectorIndices.add("embeddings", config, new DocumentVectorizer());

After adding entities to the GigaMap, we can search for similar vectors.

// Add entities (automatically indexed)
gigaMap.add(new Document("Hello world", embedding));

// Search for similar vectors (returns top 10 results)
VectorSearchResult<Document> result = index.search(queryVector, 10);

for (VectorSearchResult.Entry<Document> entry : result)
{
    Document doc = entry.entity();    // Lazy entity access
    float score = entry.score();      // Similarity score
    long id = entry.entityId();       // Entity ID
}

The search results support the Java Stream API for convenient filtering and transformation.

List<Document> topDocs = result.stream()
    .filter(e -> e.score() > 0.8f)
    .map(VectorSearchResult.Entry::entity)
    .toList();

Similarity Functions

The following similarity functions are available:

Function Description

Function	Description
`COSINE`	Cosine similarity, normalized for direction. Best for text embeddings.
`DOT_PRODUCT`	Dot product similarity. Use when vectors are already normalized.
`EUCLIDEAN`	Euclidean distance. Best for geometric or spatial data.

COSINE

Cosine similarity, normalized for direction. Best for text embeddings.

DOT_PRODUCT

Dot product similarity. Use when vectors are already normalized.

EUCLIDEAN

Euclidean distance. Best for geometric or spatial data.

Persistence with EclipseStore

Binary type handlers are registered automatically when using EclipseStore.

try (EmbeddedStorageManager storage = EmbeddedStorage.start(storageDir))
{
    GigaMap<Document> gigaMap = GigaMap.New();
    storage.setRoot(gigaMap);

    VectorIndices<Document> vectorIndices = gigaMap.index().register(VectorIndices.Category());
    VectorIndex<Document> index = vectorIndices.add("embeddings", config, new DocumentVectorizer());

    gigaMap.add(new Document("text", embedding));

    storage.storeRoot();
}

Limitations

Null vectors are not accepted: The Vectorizer.vectorize() method must never return null. If it does, an IllegalStateException is thrown. Ensure that every entity added to the GigaMap can produce a valid vector.
~2.1 billion vectors per index: JVector uses int for graph node ordinals. For larger datasets, implement sharding across multiple indices.
PQ compression requires maxDegree=32: FusedPQ algorithm constraint (auto-enforced).

Examples

GigaMap vector examples

JVector Index

What is Vector Search?

How it Works

When to Use Vector Search

Integration with GigaMap

Features

Requirements

SIMD Acceleration (Panama Vector API)

Benefits

Java Version Requirements

JVM Parameters

Installation

Example

Similarity Functions

Persistence with EclipseStore

Limitations

Examples

Further Reading