lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <sar...@syr.edu>
Subject Re: How can I search over all documents NOT in a certain subset?
Date Tue, 05 Jun 2007 20:49:34 GMT
Hi Hilton,

Hilton Campbell wrote:
> Hello all,
> 
> In my application I want to perform a search over all the documents
> that are NOT in a certain subset, and I'm not sure how I should do
> it.
> 
> Specifically, the application performs a search and the top N results
> are shown to the user. The user may then opt to see the next top N 
> results. By the time the user chooses to see the next N results,
> however, there may be new, highly-relevant documents in the index (as
> indexing is occurring concurrently). So instead of just skipping to
> the next N, I need to perform a new search and get the top N that
> haven't been seen yet. Is anyone aware of an efficient way to
> implement this?
> 
> I can think of at least one way: I can keep track of the documents 
> that have been seen and iterate through all the hits, skipping those 
> that have already been seen. I just want to see if there isn't a 
> better way that doesn't iterate through potentially hundreds of 
> already seen hits, or if anyone has any pointers on an efficient
> implementation of this idea.

Conceptually (caveat: untested), you could:

1. Extend Filter[1] (call it DejaVuFilter) to hold a BitSet per
IndexReader.  The BitSet would hold one bit per doc[2], each initialized
to true.

2. Unset a DejaVuFilter instance's bit for each of your top N docs by
walking the TopDocs returned by Searcher.search(Query,Filter,int)[3].
Initially, you could pass in null for the Filter, and then for all
following calls, an instance of DejaVuFilter.

3. Repeat step #2 as many times as necessary.

Steve

[1]
<http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/search/Filter.html>
[2]
<http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/IndexReader.html#maxDoc()>
[3]
<http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/search/Searcher.html#search(org.apache.lucene.search.Query,%20org.apache.lucene.search.Filter,%20int)>

-- 
Steve Rowe
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message