lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: HitCollector#collect(int,float,Collection<Query>)
Date Tue, 02 Jun 2009 12:04:19 GMT
So, I've been sleeping on this for a few weeks. Would it be possible  
to solve this with a decorator? Perhaps a top level decorator that  
also decorates all subqueries at rewrite-time and then keeps the  
instantiated scorers bound to the top level decorator, i.e. makes the  
decorated query non resuable.

Query realQuery = ...
DecoratedQuery dq = new DecoratedQuery(realQuery);
searcher.search(dq, ..);
Map<Query, Float> dq.getScoringQueries();

Not quite sure if this is terrible or elegant.


     karl

7 apr 2009 kl. 12.17 skrev Michael McCandless:

> On Tue, Apr 7, 2009 at 6:13 AM, Karl Wettin <karl.wettin@gmail.com>  
> wrote:
>>
>> 7 apr 2009 kl. 10.23 skrev Michael McCandless:
>>
>>> Do you mean tracking the "atomic queries" that caused a given hit to
>>> match (where "atomic query" is a query that actually uses
>>> TermDocs/Positions to check matching, vs other queries like
>>> BooleanQuery that "glomm together" sub-query matches)?
>>>
>>> EG for a boolean query w/ N clauses, which of those N clauses  
>>> matched?
>>
>> This is exactly what I mean. I do however think it makes sense to get
>> information about non atomic queries as it seems reasonble that the  
>> first
>> clause (boolean query '+(a b)') in '+(a b) -(+c +d)' is matching is  
>> more
>> interesting than only getting to know that one of the clauses of that
>> boolean query is matching.
>
> Ahh OK I agree.  So every query in the full tree should be able to
> state whether it matched the doc.
>
>>> A natural place to do this is Scorer API, ie extend it with a
>>> "getMatchingAtomicQueries" or some such.  Probably, for efficiency,
>>> each Query should be pre-assigned an int position, and then the
>>> matching is represented as a bit array, reused across matches.  Your
>>> collector could then ask the scorer for these bits if it wanted.
>>> There should be no performance cost for collectors that don't use  
>>> this
>>> functionality.
>>
>> I'll look in to it.
>>
>> Thanks for the feedback.
>>
>>
>>     karl
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message