lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1052) Add an "termInfosIndexDivisor" to IndexReader
Date Sun, 18 Nov 2007 11:15:43 GMT


Michael McCandless commented on LUCENE-1052:

I'd like to take it one step further to eliminate the need to call IndexReader.setTermInfosIndexDivisor
up front.  The idea is to instead specify a maximum number of index terms to cache in memory.
 This could then allow TermInfosReader to set indexDivisor automatically to the smallest value
that yields a cache size less than the maximum.

This seems a simple and extremely useful extension.  Unfortunately, I'm still on an older
Lucene, but will post my update.  If you like this idea, you may want to just add the feature
directly to your implementation in the trunk.

Good idea!  This allows you to simply outright cap the memory usage,
rather than having memory usage be a fraction of the number of terms
and thus grow as your term count grows.

So you would propose replacing IndexReader.setTermInfosIndexDivisor
with IndexReader.setTermInfosIndexMaxCount or some such?  Ie you would
still need to call this on creating your reader...

> Add an "termInfosIndexDivisor" to IndexReader
> ---------------------------------------------
>                 Key: LUCENE-1052
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>         Attachments: LUCENE-1052.patch
> The termIndexInterval, set during indexing time, let's you tradeoff
> how much RAM is used by a reader to load the indexed terms vs cost of
> seeking to the specific term you want to load.
> But the downside is you must set it at indexing time.
> This issue adds an indexDivisor to TermInfosReader so that on opening
> a reader you could further sub-sample the the termIndexInterval to use
> less RAM.  EG a setting of 2 means every 2 * termIndexInterval is
> loaded into RAM.
> This is particularly useful if your index has a great many terms (eg
> you accidentally indexed binary terms).
> Spinoff from this thread:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message