lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <sar...@syr.edu>
Subject Re: How can I search over all documents NOT in a certain subset?
Date Fri, 08 Jun 2007 12:46:55 GMT
Hi Hilton,

Hilton Campbell wrote:
> Yes, that's actually come up.  The document ids are indeed changing which is
> causing problems.  I'm still trying to work it out myself, but any help
> would most definitely be appreciated.
> 
> Thanks,
> Hilton Campbell
> 
> -----Original Message-----
> From: Antony Bowesman [mailto:adb@teamware.com] 
> Sent: Wednesday, June 06, 2007 11:36 PM
> To: java-user@lucene.apache.org
> Subject: Re: How can I search over all documents NOT in a certain subset?
> 
> Steven Rowe wrote:
>> Conceptually (caveat: untested), you could:
>>
>> 1. Extend Filter[1] (call it DejaVuFilter) to hold a BitSet per
>> IndexReader.  The BitSet would hold one bit per doc[2], each initialized
>> to true.
>>
>> 2. Unset a DejaVuFilter instance's bit for each of your top N docs by
>> walking the TopDocs returned by Searcher.search(Query,Filter,int)[3].
>> Initially, you could pass in null for the Filter, and then for all
>> following calls, an instance of DejaVuFilter.
> 
> Just a thought...
> 
> If Hilton wants to be aware of new Documents in the index since the previous
> search, this requires opening a new IndexReader.
> 
> If only Documents have been added to the index I expect, but am not 
> sure, that the bits from the old IndexReader are still valid for the 
> document numbers in the new Reader. However, if there have been 
> deletions or optimisation has occurred between reader instances, then
> the document ids from the old reader may not represent the same
> documents in the new reader, so the Filter for the old reader will
> not be valid for the new search against the new reader and you may
> get false matches.
> 
> I don't think there will be a problem if there are no deletions.

My bad for not pointing out this shortcoming.

Karl Wettin's patch may be useful to you:

  <https://issues.apache.org/jira/browse/LUCENE-879>

Steve

-- 
Steve Rowe
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message