lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Gershman <>
Subject Re: Help with mass delete from large index
Date Mon, 13 Feb 2006 15:24:14 GMT
No problem; this is not meant to be a regular
operation, rather it's a (hopefully) one-time thing
till the index can be restructured.

The data is chronological in nature, deleting
everything before a specific point in time.  The index
is optimized, so is it possible to remove specific
files?  I'm open to other suggestions as to how to
approach this.

Also, I neglected to mention I'm using version 1.4.3.


--- "Michael D. Curtin" <> wrote:

> Greg Gershman wrote:
> > I'm trying to delete a large number of documents
> > (~15million) from a a large index (30+ million
> > documents).  I've started with an optimized index,
> and
> > a list of docIds (our own unique identifier for a
> > document, not a Lucene doc number) to pass to the
> > IndexReader.delete(Term t) method.  I've had a few
> > different problems.
> > ...
> > Any ideas?  I'm really confused, and the only
> other
> > option I can think of is to reindex the documents
> I
> > need, which would take much longer than deleting
> the
> > ones I dont.
> Maybe it would be useful to take a step back up the
> tree of abstractions here 
> and reexamine why you're deleting such a large
> fraction of your index, 
> particularly if you're doing it on a regular basis. 
> For example, is there a 
> chronological or other "natural" break in the data
> such that you could make 2 
> indexes with ~15M docs each in the first place, then
> just delete a few index 
> *files* instead of 15M documents, one at a time?
> --MDC
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message