lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Re: deleteDocuments by Term[] for ALL terms
Date Tue, 04 Dec 2007 09:17:56 GMT
Thanks Mike, just what I was after.
Antony


Michael McCandless wrote:
> You can just create a query with your and'd terms, and then do this:
> 
>   Weight weight = query.weight(indexSearcher);
>   IndexReader reader = indexSearcher.getIndexReader();
>   Scorer scorer = weight.scorer(reader);
>   int delCount = 0;
>   while(scorer.next()) {
>     reader.deleteDocument(scorer.doc());
>     delCount++;
>   }
> 
> that iterates over all the docIDs without scoring them and without
> building up a Hit for each, etc.
> 
> Mike
> 
> "Antony Bowesman" <adb@teamware.com> wrote:
>> Hi,
>>
>> I'm using IndexReader.deleteDocuments(Term) to delete documents in
>> batches.  I 
>> need the deleted count, so I cannot use IndexWriter.deleteDocuments().
>>
>> What I want to do is delete documents based on more than one term, but
>> not like 
>> IndexWriter.deleteDocuments(Term[]) which deletes all documents with ANY
>> term. 
>> I want it to delete documents which have ALL terms, e.g.
>>
>> Term("owner", "ownerUID") AND Term("subject", "something");
>>
>> I have a new reader being used, so I could make a new IndexSearcher and
>> query 
>> the documents with all terms in a BooleanQuery and then iterate the
>> results, but 
>> that would either mean using the Hits mechanism or TopDocs and seems like
>> a 
>> heavyweight way to do things.
>>
>> What I want to be able to do is to delete sequentially without storing up
>> a 
>> result set as I may want to delete all and ALL may be rather big.
>>
>> I see the implementation uses reader.termDocs() to do the deletion for a
>> single 
>> Term, which of course is easy, but is there a simple way to make a
>> deletion for 
>> multiple terms with AND via the reader using, say termDocs, that will not 
>> potentially use large amounts of memory, or should I just go with the
>> searcher 
>> TopDocs mechanism and do that also in batches to avoid the risk of a
>> large 
>> memory hit.
>>
>> I know there's lots of clever 'expert-mode' stuff under the Lucene API
>> hood, but 
>> does anyone know any good way to do this or have I missed anything
>> obvious in 
>> the API docs?
>>
>> Thanks
>> Antony
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message