lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4547) DocValues field broken on large indexes
Date Thu, 15 Nov 2012 21:17:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498345#comment-13498345
] 

Adrien Grand commented on LUCENE-4547:
--------------------------------------

I'm having a look at the branch, and it looks great! I like the fact that there are less types
and that values are buffered into memory so that the doc values format can make decisions
depending on the number of distinct values, ... Still, I have some questions on what you plan
to do with this branch:
 - do you plan to use this branch to:
   - fix other issues such as LUCENE-3862?
   - merge the FieldCache / FunctionValues / DocValues.Source APIs?
 - are you going to remove DocValues.Type.FLOAT_*?
 - are SimpleDVConsumer and SimpleDocValuesFormat going to replace PerDocConsumer and DocValuesFormat?
 - are you going to remove hasArray/getArray?
 - will there still be a direct=true|false option at load-time or will it depend on the format
impl (potentially with a PerFieldPerDocProducer similarly to the postings formats)?
                
> DocValues field broken on large indexes
> ---------------------------------------
>
>                 Key: LUCENE-4547
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4547
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Priority: Blocker
>             Fix For: 4.1
>
>         Attachments: test.patch
>
>
> I tried to write a test to sanity check LUCENE-4536 (first running against svn revision
1406416, before the change).
> But i found docvalues is already broken here for large indexes that have a PackedLongDocValues
field:
> {code}
> final int numDocs = 500000000;
> for (int i = 0; i < numDocs; ++i) {
>   if (i == 0) {
>     field.setLongValue(0L); // force > 32bit deltas
>   } else {
>     field.setLongValue(1<<33L); 
>   }
>   w.addDocument(doc);
> }
> w.forceMerge(1);
> w.close();
> dir.close(); // checkindex
> {code}
> {noformat}
> [junit4:junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread
#0,6,TGRP-Test2GBDocValues]
> [junit4:junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: java.lang.ArrayIndexOutOfBoundsException:
-65536
> [junit4:junit4]   2> 	at __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0)
> [junit4:junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:535)
> [junit4:junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:508)
> [junit4:junit4]   2> Caused by: java.lang.ArrayIndexOutOfBoundsException: -65536
> [junit4:junit4]   2> 	at org.apache.lucene.util.ByteBlockPool.deref(ByteBlockPool.java:305)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(FixedStraightBytesImpl.java:115)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(PackedIntValues.java:109)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(PackedIntValues.java:80)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:130)
> [junit4:junit4]   2> 	at org.apache.lucene.codecs.PerDocConsumer.merge(PerDocConsumer.java:65)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message