lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3395) FreqFilteringScorerWrapper and min/max freq options on TermQuery
Date Mon, 22 Aug 2011 18:07:29 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated LUCENE-3395:
-----------------------------

    Attachment: LUCENE-3395.patch

patch containing FreqFilteringScorerWrapper and a test.  I haven't yet done the work on TermQuery
to add options for this -- wanted to see what people thought of it first and get some code
review ... been a while since i touched code this deep in the stack.

a few things to note:

* entire class is marked experimental since it's whole existence depends on an experimental
method of the Scorer API.  that said: even if we rip out Scorer.freq, i think we can still
support this as a TermQuery feature since freq info will always be available from TermScorer.
* test currently has some nocommit's related to an NPE when trying to check the edge case
of wrapping a Scorer that matches nothing.  i think the problem relates to some code i cut/paste
from TestTermScorer for getting a Scorer from a Query+Searcher to use in the test, but it
seems to optimize the Scorer to null when it matches nothing  (even if i didn't have this
NPE, that getScorer method would be marked nocommit until someone verified it was in fact
a "valid" way for a test to get direct access to a  Scorer)

> FreqFilteringScorerWrapper and min/max freq options on TermQuery
> ----------------------------------------------------------------
>
>                 Key: LUCENE-3395
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3395
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Hoss Man
>         Attachments: LUCENE-3395.patch
>
>
> A Solr User was asking about how specify a minimum tf when searching for a term (ie:
documents matching "dog" at least 3 times).
> Based on a conversation with rmuir on IRC, that led me to realize that we now explicitly
expose a general "freq()" method on Scorer, and that min/max freq constraints could be implemented
as a general Scorer Wrapper.
> I propose that we add such a wrapper, and add setMinFreq(float)/setMaxFreq(float) methods
to TermQuery (similar to the minNumShouldMatches and disableCoord type setters in BooleanQuery)
that cause it to be used automatically.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message