Lucene Index
The Lucene index adds full-text search capabilities to entities stored in a GigaMap. Like the Bitmap and JVector indices, it is registered with the GigaMap and automatically kept in sync as entities are added, updated, or removed.
Under the hood, the index uses Apache Lucene, the de facto industry standard for full-text indexing in the Java ecosystem. This enables powerful text search features including tokenization, stemming, relevance scoring, and the full Lucene query syntax.
What is Full-Text Search?
Traditional database queries match exact values: firstName = "John" finds only exact matches. Full-text search is different - it understands text:
-
Tokenization: "Machine Learning Tutorial" becomes searchable as "machine", "learning", "tutorial"
-
Case insensitivity: "PYTHON" matches "python" and "Python"
-
Relevance scoring: Results are ranked by how well they match
-
Partial matching: Wildcards like
Py*find "Python", "Pygame", etc.
When to Use Full-Text Search
| Query Type | Index Type | Example |
|---|---|---|
Exact match |
Bitmap |
|
Full-text search |
Lucene |
|
Similarity search |
JVector |
"Find products similar to this one" |
Use the Lucene index when you need to:
-
Search within text content (descriptions, articles, comments)
-
Find documents by keywords regardless of exact phrasing
-
Rank results by relevance
-
Support advanced search syntax (AND, OR, phrases, wildcards)
-
Build search-as-you-type features
Installation
<dependency>
<groupId>org.eclipse.store</groupId>
<artifactId>gigamap-lucene</artifactId>
<version>4.0.0-beta1</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>${lucene.version}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>${lucene.version}</version>
</dependency>
Example
First, implement a DocumentPopulator that maps your entity fields to Lucene document fields.
public class ArticleDocumentPopulator extends DocumentPopulator<Article>
{
@Override
public void populate(Document document, Article article)
{
document.add(createTextField("title", article.getTitle()));
document.add(createTextField("content", article.getContent()));
document.add(createStringField("author", article.getAuthor()));
}
}
Then create a LuceneContext and register it with the GigaMap.
// Create context with document populator
LuceneContext<Article> context = LuceneContext.New(
Paths.get("lucene-index"), // Directory for index files
new ArticleDocumentPopulator() // Maps entities to documents
);
// Create GigaMap and register Lucene index
GigaMap<Article> articles = GigaMap.New();
LuceneIndex<Article> luceneIndex = articles.index().register(LuceneIndex.Category(context));
// Add entities (automatically indexed)
articles.add(new Article("Python Guide", "Learn Python programming...", "Jane"));
articles.add(new Article("Java Tips", "Best practices for Java...", "John"));
Query using the Lucene query syntax.
// Simple field search
List<Article> pythonArticles = luceneIndex.query("title:Python");
// Full-text search across content
List<Article> programmingArticles = luceneIndex.query("content:programming");
// Boolean queries
List<Article> javaByJohn = luceneIndex.query("title:Java AND author:John");
// Wildcard search
List<Article> pyArticles = luceneIndex.query("title:Py*");
// Phrase search
List<Article> exactPhrase = luceneIndex.query("content:\"best practices\"");
Field Types
The DocumentPopulator provides helper methods for different field types:
| Method | Lucene Field Type | Use Case |
|---|---|---|
|
TextField (analyzed) |
Full-text searchable content. Tokenized and indexed. |
|
StringField (not analyzed) |
Exact match only. Use for IDs, categories, status values. |
|
IntPoint |
Numeric integer values. Supports range queries. |
|
LongPoint |
Numeric long values. Supports range queries. |
|
FloatPoint |
Numeric float values. Supports range queries. |
|
DoublePoint |
Numeric double values. Supports range queries. |
Field Type Examples
public class ProductDocumentPopulator extends DocumentPopulator<Product>
{
@Override
public void populate(Document document, Product product)
{
// Full-text searchable
document.add(createTextField("name", product.getName()));
document.add(createTextField("description", product.getDescription()));
// Exact match only
document.add(createStringField("sku", product.getSku()));
document.add(createStringField("category", product.getCategory()));
// Numeric fields for range queries
document.add(createDoubleField("price", product.getPrice()));
document.add(createIntField("quantity", product.getQuantity()));
}
}
Query Syntax
The Lucene index supports the full Lucene query syntax.
Basic Queries
| Query | Description |
|---|---|
|
Field contains term "Python" |
|
Default field contains "Python" |
|
Phrase search (exact sequence) |
|
Wildcard (starts with "Py") |
|
Fuzzy search (similar spelling) |
Result Handling
Basic Query
// Returns all matching entities
List<Article> results = luceneIndex.query("title:Python");
Limited Results
// Returns at most 10 results
List<Article> top10 = luceneIndex.query("title:Python", 10);
With Relevance Scores
// Access entity ID, entity, and relevance score
luceneIndex.query("title:Python", (entityId, article, score) -> {
System.out.println(article.getTitle() + " (score: " + score + ")");
});
// With result limit
luceneIndex.query("title:Python", 10, (entityId, article, score) -> {
System.out.println(article.getTitle() + " (score: " + score + ")");
});
Persistence
The Lucene index supports three storage options:
Embedded Storage (Default)
Stores index data inside the GigaMap’s object graph. The index is persisted automatically when the GigaMap is stored.
// No directory specified - uses embedded storage
LuceneContext<Article> context = LuceneContext.New(
new ArticleDocumentPopulator()
);
Integration with EclipseStore
try (EmbeddedStorageManager storage = EmbeddedStorage.start(storageDir))
{
GigaMap<Article> articles = GigaMap.New();
storage.setRoot(articles);
LuceneContext<Article> context = LuceneContext.New(new ArticleDocumentPopulator());
LuceneIndex<Article> luceneIndex = articles.index().register(LuceneIndex.Category(context));
articles.add(new Article("Title", "Content", "Author"));
storage.storeRoot();
}