lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3395) FreqFilteringScorerWrapper and min/max freq options on TermQuery
Date Mon, 22 Aug 2011 18:29:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088897#comment-13088897
] 

Hoss Man commented on LUCENE-3395:
----------------------------------

bq. should we add a doNext() used by both nextDoc() and advance()

Ah .. yeah, good call.  it took me a few minutes to figure out the "right" way to do advance,
and i didn't notice they would up being essentially a cut/paste

bq. Also, I don't think we should hook this into TermQuery? There's no reason we cant just
make it a general wrapper to any scorer that exposes freq(), e.g. phrase too.

Yeah, i wasn't sure how people would feel about that (hence i didn't work on it yet)

My suggestion was to do both -- that this be a public Scorer that advanced users could use
to hook into anonymous subclasses of other queries; but also hook it directly into TermQuery
so that it was easier to use in (what seems to me like) the simple/common case. i mainly don't
see any harm in adding these options to TermQuery, we can do it so that TermScorer is only
wrapped if these options are used, so theres no performance downside for existing users who
don't care.

Perhaps i'm misisng something though:

1) Is there a straight forward / easy template for non-expert java users to take an arbitrary
Query and override it's Scorer with a wrapper like this?
2) Is there a simple Query Wrapper we could write in conjunction with this Scorer Wrapper
to make that trivial for users?

...if so, then i'm sold.  let's not fuck with TermQuery.

(In general, the vagueness of what "freq" means for anything other then Terms (ie: with sloppy
phrases the value of 1.0 can mean 1 exact match, or 2 matches with different slops, etc...)
is the reason i figured this would primarily be useful when dealing with TermQueries and should
be particularly easy to use there.  Particularly since Scorer.freq is marked experimental
and might go away -- which would mean this public FreqFilteringScorerWrapper would have to
go away;  if that happens, i still think we should support something like this explicitly
for TermQuery)


> FreqFilteringScorerWrapper and min/max freq options on TermQuery
> ----------------------------------------------------------------
>
>                 Key: LUCENE-3395
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3395
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Hoss Man
>         Attachments: LUCENE-3395.patch
>
>
> A Solr User was asking about how specify a minimum tf when searching for a term (ie:
documents matching "dog" at least 3 times).
> Based on a conversation with rmuir on IRC, that led me to realize that we now explicitly
expose a general "freq()" method on Scorer, and that min/max freq constraints could be implemented
as a general Scorer Wrapper.
> I propose that we add such a wrapper, and add setMinFreq(float)/setMaxFreq(float) methods
to TermQuery (similar to the minNumShouldMatches and disableCoord type setters in BooleanQuery)
that cause it to be used automatically.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message