lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3932) Improve load time of .tii files
Date Thu, 29 Mar 2012 18:58:22 GMT


Robert Muir commented on LUCENE-3932:

Our index has 600,000,000 terms. This is an index of 10,000,000 emails, with associated attachments.
We generate a lot of garbage terms when parsing, things like time stamps, malformed attachments
which parse badly, etc.

For an index like that, have you tried specifying termInfosIndexDivisor to your IndexReader
as well?
If it works with ok performance, then you could remove it adjust termIndexInterval at write-time
to have a smaller .tii

> Improve load time of .tii files
> -------------------------------
>                 Key: LUCENE-3932
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.5
>         Environment: Linux
>            Reporter: Sean Bridges
> We have a large 50 gig index which is optimized as one segment, with a 66 MEG .tii file.
 This index has no norms, and no field cache.
> It takes about 5 seconds to load this index, profiling reveals that 60% of the time is
spent in GrowableWriter.set(index, value), and most of time in set(...) is spent resizing
PackedInts.Mutatable current.
> In the constructor for TermInfosReaderIndex, you initialize the writer with the line,
> {quote}GrowableWriter indexToTerms = new GrowableWriter(4, indexSize, false);{quote}
> For our index using four as the bit estimate results in 27 resizes.
> The last value in indexToTerms is going to be ~ tiiFileLength, and if instead you use,
> {quote}int bitEstimate = (int) Math.ceil(Math.log10(tiiFileLength) / Math.log10(2));
> GrowableWriter indexToTerms = new GrowableWriter(bitEstimate, indexSize, false);{quote}
> Load time improves to ~ 2 seconds.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message