lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Burton-West (JIRA)" <>
Subject [jira] Commented: (LUCENE-2257) relax the per-segment max unique term limit
Date Thu, 11 Feb 2010 21:30:28 GMT


Tom Burton-West commented on LUCENE-2257:

Hi Michael,

Thanks for your help. We mounted one of the indexes with 2.4 billion terms on our dev server
and tested with and without the patch. (I discovered that queries containing Korean characters
would consistently trigger the bug).   With the patch, we don't see any ArrayIndexOutOfBounds
exceptions.  We are going to do a bit more testing before we put this into production. (We
rolled back our production indexes temporarily to indexes that split the terms over 2 segments
and therefore didn't trigger the bug).

Other than walking though the code in the debugger, is there some systematic way of looking
for any other places where an int is used that might also have problems when we have over
2.1x billion terms?


> relax the per-segment max unique term limit
> -------------------------------------------
>                 Key: LUCENE-2257
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9.2, 3.0.1, 3.1
>         Attachments: LUCENE-2257.patch
> Lucene can't handle more than 2.1B (limit of signed 32 bit int) unique terms in a single
> But I think we can improve this to termIndexInterval (default 128) * 2.1B.  There is
one place (internal API only) where Lucene uses an int but should use a long.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message