lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Gershman <gregge...@yahoo.com>
Subject Re: Help with mass delete from large index
Date Wed, 15 Feb 2006 00:45:08 GMT
I tried the same operation with the nightly 1.9 build
and it worked fine, no NPEs during the deletes, and
after optimization, search worked fine.

I did a little bit of debugging, a call to getField
returned null, so I think it was more than just the
Term value that was missing.  As the error only
occured after optimization, it must have something to
do with the document numbers...

Is it still worth creating a test case, or since it's
fixed in 1.9, just moving on?

Greg

--- Otis Gospodnetic <otis_gospodnetic@yahoo.com>
wrote:

> I have seen this error in my Simpy logs before....
> at least the NPE in compareTo (I don't recall the
> rest of the stack).
> Have you tried debugging this?  I suppose the Term
> field or value is null somehow... not sure why.
> 
> Otis
> P.S.
> Deleting files - don't :)
> 
> ----- Original Message ----
> From: Greg Gershman <greggersh@yahoo.com>
> To: java-user@lucene.apache.org
> Sent: Mon 13 Feb 2006 09:47:04 AM EST
> Subject: Help with mass delete from large index
> 
> I'm trying to delete a large number of documents
> (~15million) from a a large index (30+ million
> documents).  I've started with an optimized index,
> and
> a list of docIds (our own unique identifier for a
> document, not a Lucene doc number) to pass to the
> IndexReader.delete(Term t) method.  I've had a few
> different problems.
> 
> The following code is inside the loop that iterates
> through the document IDs:
> 
>                   try {
>                     Term t = new Term("docID",
> String.valueOf(docID));
>                    
> deletedCount+=indexReader.delete(t);
>                     }
>                    catch (Exception e)
>                    {
>                        System.out.println("Error
> while
> deleting docID#" + docID);
>                        e.printStackTrace();
>                    }
> 
> In order to commit the deletions, I also close and
> reopen the IndexReader periodically.
> 
> At first I was reopening the IndexReader after every
> 500K documents deleted.  The problem was that after
> ~60-75K deletions, the delete call began to throw a
> NullPointerException:
> 
> Error while deleting docID#27136356
> java.lang.NullPointerException
>         at
> java.lang.String.compareTo(String.java:402)
>         at
> org.apache.lucene.index.Term.compareTo(Term.java:76)
>         at
>
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:143)
>         at
>
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:132)
>         at
>
org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
>         at
>
org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364)
>         at
>
org.apache.lucene.index.IndexReader.delete(IndexReader.java:449)
>         at IndexEraser.main(IndexEraser.java:32)
> 
> After a little fiddling around, I tried reducing the
> interval between reopens to 5000, and most of the
> NullPointerExceptions went away.
> 
> A test search of the resulting, unoptimized index
> worked fine.
> 
> I then optimized the index to reduce the size of the
> index.  Now, instead of getting data back for many
> of
> the results, I get a null value.
> 
> Any ideas?  I'm really confused, and the only other
> option I can think of is to reindex the documents I
> need, which would take much longer than deleting the
> ones I dont.
> 
> Thanks!
> 
> Greg Gershman
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message