lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4547) DocValues field broken on large indexes
Date Mon, 19 Nov 2012 11:54:59 GMT


Simon Willnauer commented on LUCENE-4547:

hey folks,

I looked at the branch and I would want to suggest we move a little slower here. we are doing
too many things at once. Like I really don't like the trend to make FieldCache the single
source for caching. FieldCache has many problems in my opininon like is uses this DEFAULT
singleton, has a single way of how things are cached per reader some users might want to use
different access to DV like in ES we don't use FieldCache at all for many reasons.I think
we are going into the right direction here but exposing everything through FC is a no-go IMO.
I do see why we should merge the interfaces and expose un-inverted fields via the new DV interface
- nice! but hiding it behind FC is no good. 
I also don't like the way how "in-ram" DV are exposed. I don't think we should have newRAMInstance()
on the interface. Lets keep the interface clean and don't mix in how it is represented. I'd
rather vote for dedicated producers or SimpleDocValuesProducer#getNumericDocValues(boolean
inMemory). Then we can still do caching on top. The producer should really be simple and shouldn't
do caching. We can also separate the default in-memory impls in a simple helper class with
methods like static NumericDocValues load(NumericDocValues directDocValues)
> DocValues field broken on large indexes
> ---------------------------------------
>                 Key: LUCENE-4547
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Priority: Blocker
>             Fix For: 4.1
>         Attachments: test.patch
> I tried to write a test to sanity check LUCENE-4536 (first running against svn revision
1406416, before the change).
> But i found docvalues is already broken here for large indexes that have a PackedLongDocValues
> {code}
> final int numDocs = 500000000;
> for (int i = 0; i < numDocs; ++i) {
>   if (i == 0) {
>     field.setLongValue(0L); // force > 32bit deltas
>   } else {
>     field.setLongValue(1<<33L); 
>   }
>   w.addDocument(doc);
> }
> w.forceMerge(1);
> w.close();
> dir.close(); // checkindex
> {code}
> {noformat}
> [junit4:junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread
> [junit4:junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: java.lang.ArrayIndexOutOfBoundsException:
> [junit4:junit4]   2> 	at __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0)
> [junit4:junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(
> [junit4:junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler$
> [junit4:junit4]   2> Caused by: java.lang.ArrayIndexOutOfBoundsException: -65536
> [junit4:junit4]   2> 	at org.apache.lucene.util.ByteBlockPool.deref(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.DocValuesConsumer.merge(
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.PerDocConsumer.merge(
> {noformat}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message