Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 55113 invoked from network); 28 Oct 2009 15:30:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Oct 2009 15:30:00 -0000 Received: (qmail 26601 invoked by uid 500); 28 Oct 2009 15:29:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 26544 invoked by uid 500); 28 Oct 2009 15:29:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 26534 invoked by uid 99); 28 Oct 2009 15:29:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Oct 2009 15:29:58 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.211.183] (HELO mail-yw0-f183.google.com) (209.85.211.183) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Oct 2009 15:29:49 +0000 Received: by ywh13 with SMTP id 13so846563ywh.29 for ; Wed, 28 Oct 2009 08:29:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.233.2 with SMTP id f2mr15675132ybh.259.1256743768605; Wed, 28 Oct 2009 08:29:28 -0700 (PDT) In-Reply-To: References: <9ac0c6aa0910271157x555b0e3ar375d3e56f7466232@mail.gmail.com> <9ac0c6aa0910280223k6b9f371eqd439abad612eaaac@mail.gmail.com> <9ac0c6aa0910280743vc1cf36dkdb6a9558a022d6d5@mail.gmail.com> Date: Wed, 28 Oct 2009 11:29:28 -0400 Message-ID: <9ac0c6aa0910280829r3f4f167ava6a41c983353c392@mail.gmail.com> Subject: Re: IO exception during merge/optimize From: Michael McCandless To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Oct 28, 2009 at 10:58 AM, Peter Keegan wrote: > The only change I made to the source code was the patch for PayloadNearQuery > (LUCENE-1986). That patch certainly shouldn't lead to this. > It's possible that our content contains U+FFFF. I will run in debugger and > see. OK may as well check just so we cover all possibilities. > The data is 'sensitive', so I may not be able to provide a bad segment, > unfortunately. OK, maybe we can modify your CheckIndex instead. Let's start with this, which prints a warning whenever the docFreq differs but otherwise continues (vs throwing RuntimeException). I'm curious how many terms show this, and whether the TermEnum keeps working after this term that has different docFreq: Index: src/java/org/apache/lucene/index/CheckIndex.java =================================================================== --- src/java/org/apache/lucene/index/CheckIndex.java (revision 829889) +++ src/java/org/apache/lucene/index/CheckIndex.java (working copy) @@ -672,8 +672,8 @@ } if (freq0 + delCount != docFreq) { - throw new RuntimeException("term " + term + " docFreq=" + - docFreq + " != num docs seen " + freq0 + " + num docs deleted " + delCount); + System.out.println("WARNING: term " + term + " docFreq=" + + docFreq + " != num docs seen " + freq0 + " + num docs deleted " + delCount); } } Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org