lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie <ja...@stimulussoft.com>
Subject Re: combine results from multiple queries & sort
Date Wed, 14 Mar 2012 11:47:58 GMT
Li

Many thanks for the tip. I used the searchWithFilter approach and its 
working brilliantly!!

For the benefit of others, the solutions as follows:

TermsFilter idFilter = new TermsFilter();
for (String id : ids) {
         idFilter.addTerm(new Term("uid",id));
}
searcher.search(query,idFilter,tfc);

Regards

Jamie
On 2012/03/14 12:44 PM, Li Li wrote:
> it's a very common problem. many of our users(including programmers that
> familiar with sql) have the same question.
>
> comparing with sql, all queries in lucene are based on inverted index.
> fortunately, when searching, we can providing a Filter.
>
> from source codes of function searchWithFilter
> we can see searching is similar to boolean and queries.
>
> I think you can use TermsFilter.
> it just iterator through the terms(your ids) and use BitSet to do filter.
> if the documents contains any of the words, it's set to 1, otherwise is 0.
>
> I think this implementation is fast enough, it use tii/tis to locate words,
> and for each words, it iterate through it's postings by frq file. postings
> may be cached by lucene.
>
> If it can't meet your performance needs. you can implement your own
> Collector and using your own cache policy(maybe load all this fields into
> memory by a hashmap  your ids->document id)
> when a query is "id in(1,3,5,)", you construct a Collector. when it
> collects docs, you filter unwanted documents
>
> On Wed, Mar 14, 2012 at 4:01 PM, Jamie<jamie@stimulussoft.com>  wrote:
>
>> Greetings!
>>
>> First off, I realize Lucene is a search engine and therefore does not
>> possess many of the features of a database. That being said, I have
>> encountered a particular use case where I need to lookup potentially
>> thousands of records in a Lucene index based upon an ID (a String field in
>> the index). This data also needs be sorted based upon any chosen field in
>> the index. In pseudo code, this is how its currently done:
>>
>> String[] ids = { "123aeeff", "34eacc", ...}
>>
>> results.clear();
>> StringBuffer lookupQuery = new StringBuffer()
>> for (int i=0; i<ids.size();i++) {
>>        lookupQuery.append(ids.get(i))
>>        lookupQuery.append(" ")
>>         if ((i+1) % 1024 == 0) {
>>             search(lookupQuery.toString())
>>              lookupQuery = new StringBuffer()
>>        }
>>   }
>> if (lookupQuery.length()>0) {
>>         search(lookupQuery.toString())
>> }
>>
>> As you can see, in a loop, Lucene queries are constructed into a maximum
>> of 1024 terms, for example, consisting of IDS "123aeeff 34eacc ..". After
>> each query in the loop is constructed, a search is executed and then the
>> results are combined into a single linkedlist (this is done in the search
>> function). This works well aside from two outstanding questions:
>>
>> 1. Is executing separate search queries, the best way to lookup
>>    thousands of records in an index? Is there a more efficient way to
>>    lookup thousands of records based upon ID?
>> 2. The results are unsorted after they are combined into a single
>>    linkedlist. What is the best way to sort the combined results based
>>    upon any chosen field in the lucene index? Is there a way to do that
>>    would leverage Lucene's inbuilt sort abilities?
>>
>> Many thanks for your consideration
>>
>> Jamie
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message