lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anuj Bhatt <>
Subject Re: Doc IDs via IndexReader?
Date Thu, 23 Jul 2009 19:45:30 GMT

Thanks Shai and Mike for your suggestions. I went with Shai's second
approach. However, I'm confronted with this now:

After deleting that document from the index, I also delete it from a
copy of the directory that contained the original documents. With
this, I expected that both the directory as well as the index, both
shouldn't have had the document. More precisely, I have taken this
updated directory and take each document in that directory and convert
it to a query. I then send this query to the index via IndexSearcher
and examine the hits for each document. For some reason, I get a
document which I had deleted from the index (via IndexReader). Is
there any valid explanation for this? How can I be assured that the
index will not contain that document. Here's the code snippet I am
experimenting this with (hopefully things are self explanatory):

        System.out.println("Documents which are in the whitelist :
    	IndexReader reader =;
    	for(int doc_itr=0; doc_itr < reader.maxDoc(); doc_itr++)
                       //skip if I encountered this document
    		else if (!reader.isDeleted(doc_itr))
    			System.out.println("Deleting document with name:
    			File docToDelete = new
    			System.out.println("Also deleting original document


On Thu, Jul 23, 2009 at 6:24 AM, Michael
McCandless<> wrote:
> I think you could also delete by Query (using IndexWriter), concocting
> a single large query that's something like MatchAllDocsQuery AND NOT
> (Q1 OR Q2 OR Q3...) where Q1, Q2, Q3 are the queries that identify the
> docs you want to keep.
> Mike
> On Wed, Jul 22, 2009 at 10:58 PM, Anuj Bhatt<> wrote:
>> Hi,
>> I'm relatively new to Lucene. I have the following case: I have
>> indexed a bunch of documents. I then, query the index using
>> IndexSearcher and retrieve the documents using Hits (I do know this is
>> deprecated -- I'm using v 2.4.1). So, I do this for a set of queries
>> and maintain which documents are returned to each one. In the end of
>> it all, I have a list of documents maintained (more specifically, the
>> associated with the doc). Now, I wish to
>> delete the documents which have not been returned for any query, from
>> the index. How can I do this?
>> My initial assumption was that I could retrieve all the doc ids from
>> IndexReader and just traverse the list that I have maintained, if it
>> is in the list, I don't delete it otherwise I do. Looking around
>> didn't yield anything, and hence the mail.
>> Any suggestions?
>> Regards,
>> Anuj
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message