lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael D. Curtin" <m...@curtin.com>
Subject Re: Help with mass delete from large index
Date Mon, 13 Feb 2006 14:52:45 GMT
Greg Gershman wrote:

> I'm trying to delete a large number of documents
> (~15million) from a a large index (30+ million
> documents).  I've started with an optimized index, and
> a list of docIds (our own unique identifier for a
> document, not a Lucene doc number) to pass to the
> IndexReader.delete(Term t) method.  I've had a few
> different problems.
> ...
> Any ideas?  I'm really confused, and the only other
> option I can think of is to reindex the documents I
> need, which would take much longer than deleting the
> ones I dont.

Maybe it would be useful to take a step back up the tree of abstractions here 
and reexamine why you're deleting such a large fraction of your index, 
particularly if you're doing it on a regular basis.  For example, is there a 
chronological or other "natural" break in the data such that you could make 2 
indexes with ~15M docs each in the first place, then just delete a few index 
*files* instead of 15M documents, one at a time?

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message