lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: HitCollector#collect(int,float,Collection<Query>)
Date Tue, 09 Jun 2009 15:56:58 GMT
My guess is such an approach could be made to work...

But I think I'd rather directly improve *Scorer so that they provide
such details (and you pay no performance cost if you don't ask for
these details).  Likewise for positional details of matching, which
highlighter could use.  And, then, we could absorb Span* back into
their primary counterparts.

Mike

On Tue, Jun 2, 2009 at 8:04 AM, Karl Wettin<karl.wettin@gmail.com> wrote:
> So, I've been sleeping on this for a few weeks. Would it be possible to
> solve this with a decorator? Perhaps a top level decorator that also
> decorates all subqueries at rewrite-time and then keeps the instantiated
> scorers bound to the top level decorator, i.e. makes the decorated query non
> resuable.
>
> Query realQuery = ...
> DecoratedQuery dq = new DecoratedQuery(realQuery);
> searcher.search(dq, ..);
> Map<Query, Float> dq.getScoringQueries();
>
> Not quite sure if this is terrible or elegant.
>
>
>    karl
>
> 7 apr 2009 kl. 12.17 skrev Michael McCandless:
>
>> On Tue, Apr 7, 2009 at 6:13 AM, Karl Wettin <karl.wettin@gmail.com> wrote:
>>>
>>> 7 apr 2009 kl. 10.23 skrev Michael McCandless:
>>>
>>>> Do you mean tracking the "atomic queries" that caused a given hit to
>>>> match (where "atomic query" is a query that actually uses
>>>> TermDocs/Positions to check matching, vs other queries like
>>>> BooleanQuery that "glomm together" sub-query matches)?
>>>>
>>>> EG for a boolean query w/ N clauses, which of those N clauses matched?
>>>
>>> This is exactly what I mean. I do however think it makes sense to get
>>> information about non atomic queries as it seems reasonble that the first
>>> clause (boolean query '+(a b)') in '+(a b) -(+c +d)' is matching is more
>>> interesting than only getting to know that one of the clauses of that
>>> boolean query is matching.
>>
>> Ahh OK I agree.  So every query in the full tree should be able to
>> state whether it matched the doc.
>>
>>>> A natural place to do this is Scorer API, ie extend it with a
>>>> "getMatchingAtomicQueries" or some such.  Probably, for efficiency,
>>>> each Query should be pre-assigned an int position, and then the
>>>> matching is represented as a bit array, reused across matches.  Your
>>>> collector could then ask the scorer for these bits if it wanted.
>>>> There should be no performance cost for collectors that don't use this
>>>> functionality.
>>>
>>> I'll look in to it.
>>>
>>> Thanks for the feedback.
>>>
>>>
>>>    karl
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message