lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: using solr to do a 'match'
Date Wed, 11 Apr 2012 08:32:01 GMT
Hi,

This use case is similar to matching boolean expression problem. You can
find recent thread about it. I have an idea that we can introduce
disjunction query with dynamic mm (minShouldMatch parameter
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/BooleanQuery.html#setMinimumNumberShouldMatch(int))
i.e. 'match these clauses disjunctively but for every document use
value
from field cache of field xxxCount as a minShouldMatch parameter'. Also
norms can be used as a source for dynamics mm values.

Wdyt?

On Wed, Apr 11, 2012 at 10:08 AM, Li Li <fancyerii@gmail.com> wrote:

> it's not possible now because lucene don't support this.
> when doing disjunction query, it only record how many terms match this
> document.
> I think this is a common requirement for many users.
> I suggest lucene should divide scorer to a matcher and a scorer.
> the matcher just return which doc is matched and why/how the doc is
> matched.
> especially for disjuction query, it should tell which term matches and
> possible other
> information such as tf/idf and the distance of terms(to support proximity
> search).
> That's the matcher's job. and then the scorer(a ranking algorithm) use
> flexible algorithm
> to score this document and the collector can collect it.
>
> On Wed, Apr 11, 2012 at 10:28 AM, Chris Book <chrisbook@gmail.com> wrote:
>
> > Hello, I have a solr index running that is working very well as a search.
> >  But I want to add the ability (if possible) to use it to do matching.
>  The
> > problem is that by default it is only looking for all the input terms to
> be
> > present, and it doesn't give me any indication as to how many terms in
> the
> > target field were not specified by the input.
> >
> > For example, if I'm trying to match to the song title "dust in the wind",
> > I'm correctly getting a match if the input query is "dust in wind".  But
> I
> > don't want to get a match if the input is just "dust".  Although as a
> > search "dust" should return this result, I'm looking for some way to
> filter
> > this out based on some indication that the input isn't close enough to
> the
> > output.  Perhaps if I could get information that that the number of input
> > terms is much less than the number of terms in the field.  Or something
> > else along those line?
> >
> > I realize that this isn't the typical use case for a search, but I'm just
> > looking for some suggestions as to how I could improve the above example
> a
> > bit.
> >
> > Thanks,
> > Chris
> >
>



-- 
Sincerely yours
Mikhail Khludnev
gedel@yandex.ru

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
View raw message