lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <>
Subject Re: Document ID shuffling under 2.3.x (on merge?)
Date Thu, 13 Mar 2008 05:00:11 GMT
On Thursday 13 March 2008 00:42:59 Erick Erickson wrote:
> I certainly found that lazy loading changed my speed dramatically, but
> that was on a particularly field-heavy index.
> I wonder if TermEnum/TermDocs would be fast enough on an indexed
> (UN_TOKENIZED???) field for a unique id.
> Mostly, I'm hoping you'll try this and tell me if it works so I don't have
> to sometime <G>....

I added a "uid" field to our existing fields.  After the load there were some 
gaps in the values for this field; presumably those were documents where 
adding the doc failed and adding the fallback doc also failed.  The index 
contains 20004 documents.  Each test I ran over 10 iterations and times below 
are an average of the last 5 as it took around 5 rounds to warm up.

Filter building, for a filter returning 1000 documents randomly selected:

   Time to build filter by UID (100% Derby) - 93ms
   Additional time to build filter by DocID - 12ms (13% penalty)

13% penalty is acceptable IMO.  The problem comes next.

Bulk operation building, for a query returning around 2800 documents:

   Time to build the bulkop by DocID (100% Hits) - 6ms
   Time to fetch the "uid" field from the document - 152ms (2600% penalty)
   Time to do the DB query (not counting commit though) - 10ms

For interest's sake I also timed fetching the document with no FieldSelector, 
that takes around 410ms for the same documents.  So there is still a big 
benefit in using the field selector, it just isn't anywhere near enough to 
get it close to the time it takes to retrieve the doc IDs.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message