Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 17119 invoked from network); 13 Feb 2006 14:47:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Feb 2006 14:47:35 -0000 Received: (qmail 55780 invoked by uid 500); 13 Feb 2006 14:47:28 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 55755 invoked by uid 500); 13 Feb 2006 14:47:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 55741 invoked by uid 99); 13 Feb 2006 14:47:27 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Feb 2006 06:47:27 -0800 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [206.190.38.227] (HELO web51709.mail.yahoo.com) (206.190.38.227) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 13 Feb 2006 06:47:26 -0800 Received: (qmail 95680 invoked by uid 60001); 13 Feb 2006 14:47:04 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=N7M2YT3KU1og2flYeSTZfcQSy/CQfYqfyP41GONygi8yXS2xFpbzcs2TemG6ECGdwrptb9LNSn4lcnvBYh4xzGDmWiGVLIeCvmcXt0Xzh9UKm7EAlaNH5q/GT2Thm/wBRgbPgBFs+qEZ+ujETD0MFTzG232+xYeTPJnHoWLeRB8= ; Message-ID: <20060213144704.95678.qmail@web51709.mail.yahoo.com> Received: from [69.251.48.204] by web51709.mail.yahoo.com via HTTP; Mon, 13 Feb 2006 06:47:04 PST Date: Mon, 13 Feb 2006 06:47:04 -0800 (PST) From: Greg Gershman Subject: Help with mass delete from large index To: java-user@lucene.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I'm trying to delete a large number of documents (~15million) from a a large index (30+ million documents). I've started with an optimized index, and a list of docIds (our own unique identifier for a document, not a Lucene doc number) to pass to the IndexReader.delete(Term t) method. I've had a few different problems. The following code is inside the loop that iterates through the document IDs: try { Term t = new Term("docID", String.valueOf(docID)); deletedCount+=indexReader.delete(t); } catch (Exception e) { System.out.println("Error while deleting docID#" + docID); e.printStackTrace(); } In order to commit the deletions, I also close and reopen the IndexReader periodically. At first I was reopening the IndexReader after every 500K documents deleted. The problem was that after ~60-75K deletions, the delete call began to throw a NullPointerException: Error while deleting docID#27136356 java.lang.NullPointerException at java.lang.String.compareTo(String.java:402) at org.apache.lucene.index.Term.compareTo(Term.java:76) at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:143) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:132) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364) at org.apache.lucene.index.IndexReader.delete(IndexReader.java:449) at IndexEraser.main(IndexEraser.java:32) After a little fiddling around, I tried reducing the interval between reopens to 5000, and most of the NullPointerExceptions went away. A test search of the resulting, unoptimized index worked fine. I then optimized the index to reduce the size of the index. Now, instead of getting data back for many of the results, I get a null value. Any ideas? I'm really confused, and the only other option I can think of is to reindex the documents I need, which would take much longer than deleting the ones I dont. Thanks! Greg Gershman __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org