lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4547) DocValues field broken on large indexes
Date Tue, 20 Nov 2012 20:34:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501447#comment-13501447
] 

Simon Willnauer commented on LUCENE-4547:
-----------------------------------------

one way of merging the two approaches would be a simple boolean that forces on-disk access.
that way we can solve the following problems:

 * default merge impls. a format doesn't need to override merge if its an in-ram by default
impl
 * people can build auxiliary structures on top without worrying about ram
 * leave decisions to the codec what in-memory structure or on-disk structure it uses by default

this way we can keep the api clean and support expert users. We can even make this package
private so that "normal" users won't see it?
                
> DocValues field broken on large indexes
> ---------------------------------------
>
>                 Key: LUCENE-4547
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4547
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Priority: Blocker
>             Fix For: 4.1
>
>         Attachments: test.patch
>
>
> I tried to write a test to sanity check LUCENE-4536 (first running against svn revision
1406416, before the change).
> But i found docvalues is already broken here for large indexes that have a PackedLongDocValues
field:
> {code}
> final int numDocs = 500000000;
> for (int i = 0; i < numDocs; ++i) {
>   if (i == 0) {
>     field.setLongValue(0L); // force > 32bit deltas
>   } else {
>     field.setLongValue(1<<33L); 
>   }
>   w.addDocument(doc);
> }
> w.forceMerge(1);
> w.close();
> dir.close(); // checkindex
> {code}
> {noformat}
> [junit4:junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread
#0,6,TGRP-Test2GBDocValues]
> [junit4:junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: java.lang.ArrayIndexOutOfBoundsException:
-65536
> [junit4:junit4]   2> 	at __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0)
> [junit4:junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:535)
> [junit4:junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:508)
> [junit4:junit4]   2> Caused by: java.lang.ArrayIndexOutOfBoundsException: -65536
> [junit4:junit4]   2> 	at org.apache.lucene.util.ByteBlockPool.deref(ByteBlockPool.java:305)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(FixedStraightBytesImpl.java:115)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(PackedIntValues.java:109)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(PackedIntValues.java:80)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:130)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.PerDocConsumer.merge(PerDocConsumer.java:65)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message