lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Burton-West (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2257) relax the per-segment max unique term limit
Date Wed, 10 Feb 2010 16:35:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832054#action_12832054
] 

Tom Burton-West commented on LUCENE-2257:
-----------------------------------------

Thanks for the patch Michael,

The patch worked fine with CheckIndex.  Checkindex worked with an index with 2.49 billion
terms.
I added commas to the output below:
 test: terms, freq, prox...OK [2,487,224,745 terms; 23,573,976,855 terms/docs pairs; 97,223,318,067
tokens]

We are working on determining how to test it with Solr.  The ArrayIndexOutOfBounds exception
appears in the logs about for about 1 in every 100 queries.   We haven't been able to determine
which queries trigger the problem.

We are using an older version of Solr with lucene 2.9-dev 779312 - 2009-05-27 17:19:55 . 
I'm not sure if we can just drop in a later version of lucene with the patch or if I need
to patch the older 2.9 dev lucene version that came with our Solr.   What do you suggest?

What I'm thinking of is to run 10,000 queries against our dev server pointing at one of the
large segment indexes  with and without the patch.

Tom




> relax the per-segment max unique term limit
> -------------------------------------------
>
>                 Key: LUCENE-2257
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2257
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9.2, 3.0.1, 3.1
>
>         Attachments: LUCENE-2257.patch
>
>
> Lucene can't handle more than 2.1B (limit of signed 32 bit int) unique terms in a single
segment.
> But I think we can improve this to termIndexInterval (default 128) * 2.1B.  There is
one place (internal API only) where Lucene uses an int but should use a long.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message