Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9B953D2DA for ; Fri, 16 Nov 2012 14:50:17 +0000 (UTC) Received: (qmail 34603 invoked by uid 500); 16 Nov 2012 14:50:14 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 34516 invoked by uid 500); 16 Nov 2012 14:50:14 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 33934 invoked by uid 99); 16 Nov 2012 14:50:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Nov 2012 14:50:13 +0000 Date: Fri, 16 Nov 2012 14:50:13 +0000 (UTC) From: "Robert Muir (JIRA)" To: dev@lucene.apache.org Message-ID: <94280497.123959.1353077413862.JavaMail.jiratomcat@arcas> In-Reply-To: <829569411.80739.1352299992530.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (LUCENE-4547) DocValues field broken on large indexes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498838#comment-13498838 ] Robert Muir commented on LUCENE-4547: ------------------------------------- What would the flush/merge api look like? Would it get simple or more complicated? Could we still require certain stats from the Producer, so that we can have a default, efficient in-RAM Source impl? {quote} I think there are many possible optimizations based on how much lengths vary, whether bytes refs share prefixes or not, ... {quote} maybe, but arguably we should do the simplest possible thing that can work given the codecs we have today. When designing these apis, to me these are the only ones that exist... > DocValues field broken on large indexes > --------------------------------------- > > Key: LUCENE-4547 > URL: https://issues.apache.org/jira/browse/LUCENE-4547 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > Priority: Blocker > Fix For: 4.1 > > Attachments: test.patch > > > I tried to write a test to sanity check LUCENE-4536 (first running against svn revision 1406416, before the change). > But i found docvalues is already broken here for large indexes that have a PackedLongDocValues field: > {code} > final int numDocs = 500000000; > for (int i = 0; i < numDocs; ++i) { > if (i == 0) { > field.setLongValue(0L); // force > 32bit deltas > } else { > field.setLongValue(1<<33L); > } > w.addDocument(doc); > } > w.forceMerge(1); > w.close(); > dir.close(); // checkindex > {code} > {noformat} > [junit4:junit4] 2> WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread #0,6,TGRP-Test2GBDocValues] > [junit4:junit4] 2> org.apache.lucene.index.MergePolicy$MergeException: java.lang.ArrayIndexOutOfBoundsException: -65536 > [junit4:junit4] 2> at __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0) > [junit4:junit4] 2> at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:535) > [junit4:junit4] 2> at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:508) > [junit4:junit4] 2> Caused by: java.lang.ArrayIndexOutOfBoundsException: -65536 > [junit4:junit4] 2> at org.apache.lucene.util.ByteBlockPool.deref(ByteBlockPool.java:305) > [junit4:junit4] 2> at org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(FixedStraightBytesImpl.java:115) > [junit4:junit4] 2> at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(PackedIntValues.java:109) > [junit4:junit4] 2> at org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(PackedIntValues.java:80) > [junit4:junit4] 2> at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:130) > [junit4:junit4] 2> at org.apache.lucene.codecs.PerDocConsumer.merge(PerDocConsumer.java:65) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org