lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Röpke <>
Subject Re: realtime indexing
Date Fri, 16 Nov 2007 11:42:59 GMT

On Nov 16, 2007, at 11:59 AM, Antoine Baudoux wrote:

> 	I'm trying to implement a similar solution.
> 	Could you be more precise on how you handle duplicates, as well as  
> document deletion?

The key probably is (it was for us, anyway) that you have a fast way  
of determining whether or not a given document is in an index.
We use (and John et al, too, I suppose) the unique id (!= doc id) each  
document has for that purpose. The basic idea for that should be in  
the archives.

So, back to the question:
By definition anything in the RAM index is newer than anything on  
disk, so documents found in the RAM index should supersede docs from  
disk when they have the same unique id (user id, primary key, whatever).
When you have the hits of the query you can easily determine duplicate  
primary keys, and for those you look up from which index they came (by  
asking an enhanced MultiReader that knows the indices and their doc id  
ranges). The last operation obviously has to be very fast, thus we use  
out custom id => docid mapping mechanism (and I think John is using  
his own, too).

There are probably even more clever ways of doing this, but it should  
give you an idea. :)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message