lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Fwd: using solr to do a 'match'
Date Wed, 11 Apr 2012 08:59:34 GMT
---------- Forwarded message ----------
From: Li Li <fancyerii@gmail.com>
Date: Wed, Apr 11, 2012 at 4:59 PM
Subject: Re: using solr to do a 'match'
To: solr-user@lucene.apache.org


I searched my mail but nothing found.
the thread searched by key words "boolean expression" is Indexing Boolean
Expressions from joaquin.delgado
to tell which terms are matched, for BooleanScorer2, a simple method is to
modify DisjunctionSumScorer and add a BitSet to record matched scorers.
When collector collect this document, it can get the scorer and recursively
find the matched terms.
But I think maybe it's better to add a component maybe named matcher that
do the matching job, and scorer use the information from the matcher and do
ranking things.


On Wed, Apr 11, 2012 at 4:32 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Hi,
>
> This use case is similar to matching boolean expression problem. You can
> find recent thread about it. I have an idea that we can introduce
> disjunction query with dynamic mm (minShouldMatch parameter
>
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/BooleanQuery.html#setMinimumNumberShouldMatch(int)
> )
> i.e. 'match these clauses disjunctively but for every document use
> value
> from field cache of field xxxCount as a minShouldMatch parameter'. Also
> norms can be used as a source for dynamics mm values.
>
> Wdyt?
>
> On Wed, Apr 11, 2012 at 10:08 AM, Li Li <fancyerii@gmail.com> wrote:
>
> > it's not possible now because lucene don't support this.
> > when doing disjunction query, it only record how many terms match this
> > document.
> > I think this is a common requirement for many users.
> > I suggest lucene should divide scorer to a matcher and a scorer.
> > the matcher just return which doc is matched and why/how the doc is
> > matched.
> > especially for disjuction query, it should tell which term matches and
> > possible other
> > information such as tf/idf and the distance of terms(to support proximity
> > search).
> > That's the matcher's job. and then the scorer(a ranking algorithm) use
> > flexible algorithm
> > to score this document and the collector can collect it.
> >
> > On Wed, Apr 11, 2012 at 10:28 AM, Chris Book <chrisbook@gmail.com>
> wrote:
> >
> > > Hello, I have a solr index running that is working very well as a
> search.
> > >  But I want to add the ability (if possible) to use it to do matching.
> >  The
> > > problem is that by default it is only looking for all the input terms
> to
> > be
> > > present, and it doesn't give me any indication as to how many terms in
> > the
> > > target field were not specified by the input.
> > >
> > > For example, if I'm trying to match to the song title "dust in the
> wind",
> > > I'm correctly getting a match if the input query is "dust in wind".
>  But
> > I
> > > don't want to get a match if the input is just "dust".  Although as a
> > > search "dust" should return this result, I'm looking for some way to
> > filter
> > > this out based on some indication that the input isn't close enough to
> > the
> > > output.  Perhaps if I could get information that that the number of
> input
> > > terms is much less than the number of terms in the field.  Or something
> > > else along those line?
> > >
> > > I realize that this isn't the typical use case for a search, but I'm
> just
> > > looking for some suggestions as to how I could improve the above
> example
> > a
> > > bit.
> > >
> > > Thanks,
> > > Chris
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> gedel@yandex.ru
>
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com>
>

Mime
View raw message