lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3395) FreqFilteringScorerWrapper and min/max freq options on TermQuery
Date Mon, 22 Aug 2011 19:20:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088943#comment-13088943
] 

Robert Muir commented on LUCENE-3395:
-------------------------------------

{quote}
I'm totally on board with the idea, i just don't really understand what the implementation
should look like – not because i don't think it's possible, i just don't personally understand
how to write it. Would the weight of the wrapper query just proxy directly to the inner query
for all of the methods except scorer(..) ?
{quote}

I'm just looking and typing and not testing, but...
# getValueForNormalization/normalize: I think in general these delegate similar to constant
score query, because conceptually you could give this wrapping query a boost (e.g. someone
says wrap("foo",min=3)^5 OR "foo"^1 <-- dumb syntax to indicate if it has more than 3 matches
it gets a special boost)
# explain: i think this doesn't need to be particularly efficient, more important that its
correct. So in this case the Weight would implement this method, first calling sub.scorer
and checking the freq is in bounds (else no match). If the freq is in bounds, then it just
delegates (yeah i know this means the sub 'explains again' but its easy?)

and the rest seem obvious to me?

> FreqFilteringScorerWrapper and min/max freq options on TermQuery
> ----------------------------------------------------------------
>
>                 Key: LUCENE-3395
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3395
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Hoss Man
>         Attachments: LUCENE-3395.patch
>
>
> A Solr User was asking about how specify a minimum tf when searching for a term (ie:
documents matching "dog" at least 3 times).
> Based on a conversation with rmuir on IRC, that led me to realize that we now explicitly
expose a general "freq()" method on Scorer, and that min/max freq constraints could be implemented
as a general Scorer Wrapper.
> I propose that we add such a wrapper, and add setMinFreq(float)/setMaxFreq(float) methods
to TermQuery (similar to the minNumShouldMatches and disableCoord type setters in BooleanQuery)
that cause it to be used automatically.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message