lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <>
Subject Parallel incremental indexing
Date Sun, 30 Aug 2009 06:23:04 GMT
Hi all,

I just added a wiki page for a new feature I'd like to add to
Lucene. Please take a look at the link. I will add more details and
diagrams to the page, but for now it should give a rough idea about
how to implement it:

Basically the idea is to allow updating documents partially, e.g. only
a subset of the fields without having to reindex the entire
document. This is a feature that is very often asked for.

We have implemented the solution in IBM and it's working
great. It is a technology that allowed us already to add really exciting
new features to products that weren't easily possible before.

The implementation I can currently contribute has some limitations:
e.g. multi-threaded indexing is not supported. But let me make clear
that this is not a limitation of the design described in the wiki - we
have these limitations because we implemented this on top of Lucene's 2.4
APIs. If we decide to add this to Lucene's core we should
reimplement some parts to overcome those limitations.

In my opinion this will be a great addition to Lucene that many
people will find very useful. In Solr this is also something users often
ask for.

In the last weeks I worked on getting internal approval for the contribution
to Lucene and the good news is that I already have a signed
software grant ready - so if the community likes this feature and
decides to add this to Lucene there won't be any delay for legal work
from IBM's side.

Btw: I will be on vacation from 09/03-09/20 and won't have internet
access most of the time, so if I stop responding end of next week you'll 
know why...

Please let me know what you think!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message