lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4547) DocValues field broken on large indexes
Date Tue, 20 Nov 2012 20:12:58 GMT


Simon Willnauer commented on LUCENE-4547:

bq. If an expert app really need to pick & choose ram vs disk dynamically, depending on
how many other indices are open and how much RAM they are using, etc., they can always make
a custom DV format ...

what I am worried about is the lack of communication between the app and the codec. something
like this is going to be a major hassle. all I am asking about is to pass in "hints" to the
codec what I need at a certain point per field. We can't do this and I think we shouldn't
allow this. its an encoding / decoding layer and it should be simple. pushing what you call
"experts" to write their own codecs is a major trap I think. writing a codec is last resort
and causes major trouble for non-lucene devs IMO. This is expertexpert :)

I really like the idea of perfieldDV and I think we should do it. I am just not a big fan
of making up-front decisions for this stuff when it comes to on-disk vs. ram. PostingsFormat
is a different story, the on disk (low ram useage) have such a perf characteristics that you
very unlikely need something else useing lots of ram. For sorting, grouping or scoring you
will certainly need that.
> DocValues field broken on large indexes
> ---------------------------------------
>                 Key: LUCENE-4547
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Priority: Blocker
>             Fix For: 4.1
>         Attachments: test.patch
> I tried to write a test to sanity check LUCENE-4536 (first running against svn revision
1406416, before the change).
> But i found docvalues is already broken here for large indexes that have a PackedLongDocValues
> {code}
> final int numDocs = 500000000;
> for (int i = 0; i < numDocs; ++i) {
>   if (i == 0) {
>     field.setLongValue(0L); // force > 32bit deltas
>   } else {
>     field.setLongValue(1<<33L); 
>   }
>   w.addDocument(doc);
> }
> w.forceMerge(1);
> w.close();
> dir.close(); // checkindex
> {code}
> {noformat}
> [junit4:junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread
> [junit4:junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: java.lang.ArrayIndexOutOfBoundsException:
> [junit4:junit4]   2> 	at __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0)
> [junit4:junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(
> [junit4:junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler$
> [junit4:junit4]   2> Caused by: java.lang.ArrayIndexOutOfBoundsException: -65536
> [junit4:junit4]   2> 	at org.apache.lucene.util.ByteBlockPool.deref(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.DocValuesConsumer.merge(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.PerDocConsumer.merge(
> {noformat}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message