lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthick Sankarachary (JIRA)" <>
Subject [jira] Created: (LUCENE-2425) An Anti-Merging Multi-Directory Indexing Framework
Date Sat, 01 May 2010 21:25:57 GMT
An Anti-Merging Multi-Directory Indexing Framework

                 Key: LUCENE-2425
             Project: Lucene - Java
          Issue Type: New Feature
          Components: contrib/*, Index
    Affects Versions: 3.0.1
            Reporter: Karthick Sankarachary

By design, a Lucene index tends to merge documents that span multiple segments into fewer
segments, in order to optimize its directory structure, which in turn leads to better search
performance. In particular, it relies on a merge policy to specify the set of merge operations
that should be performed when the index is optimized. 

Often times, there's a need to do the exact opposite, which is to "split" the documents. This
calls for a mechanism that facilitates sub-division of documents based on a certain (ideally,
user-defined) algorithm. By way of example, one may wish to sub-divide (or partition) documents
based on parameters such as time, space, real-timeliness, and so on. Herein, we describe an
indexing framework that builds on the Lucene index writer and reader, to address use cases
wherein documents need to diverge rather than converge.

In brief, it associates zero or more sub-directories with the index's directory, which serve
to complement it in some manner. The sub-directories (a.k.a. splits) are managed by a split
policy, which is notified of all changes made to the index directory (a.k.a. super-directory),
thus allowing it to modify its sub-directories as it sees fit. To make the index reader and
writer "observable", we extend Lucene's reader and writer with the goal of providing hooks
into every method that could potentially change the index. This allows for propagation of
such changes to the split policy, which essentially acts as a listener on the index.

We refer to each sub-directory (or split) and the super-directory as a sub-index of the containing
index (a.k.a. the split index). Note that the sub-directory may not necessarily be co-located
with the super-directory. Furthermore, the split policy in turn relies on one or more split
rules to determine when to add or remove sub-directories. This allows for a clear separation
of the event that triggers a split from the management of those splits.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message