lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <m...@apache.org>
Subject Re: Tracking that all query terms are matched in one document
Date Wed, 13 Dec 2017 10:32:59 GMT
There are two algorithm for scoring disjunction: term-a-time, doc-at-time.
The former was called BooleanScorer and the later was called
BooleanScorer2.
I remember that they was drastically renamed and/or replaced with
BulkScorer or so. Anyway, you need to find a way to prevent term-at-time
scoring, when FakeScorer is injected.
You need to make it score doc-at-time. As I told you, it's far way.

On Wed, Dec 13, 2017 at 11:55 AM, Vadim Gindin <vgindin@detectum.com> wrote:

> Hi Michael,
>
> I've tried to implement such case but faced with the following problem. I
> recall, that my Query is combined with several ConstantScoreQuery with
> BooleanQuery. I wrote custom Collector as follows:
>
> @Override
> public void setScorer(Scorer scorer) throws IOException {
>     this.scorer = scorer;
>
> }
>
> @Override
> public void collect(int doc) throws IOException {
>     System.out.println("doc=" + doc);
>     diveIntoScorers(this.scorer);
> }
>
> and, when I'm diving recursively to child scorers I'm facing new
> UnsupportedOperationException error. It happens because of the following
> code in BooleanScorer:
>
> @Override
> public int score(LeafCollector collector, Bits acceptDocs, int min,
> int max) throws IOException {
>   fakeScorer.doc = -1;
>   collector.setScorer(fakeScorer);
>
> Later fakeScorer throws an Exception.
>
> How did you implement your similar functionality?
> How to avoid this?
>
> Thanks,
> Vadim Gindin
>
> On Fri, Dec 8, 2017 at 2:01 PM, Vadim Gindin <vgindin@detectum.com> wrote:
>
> > Thank's for your help. I'll try that.
> >
> > On Tue, Dec 5, 2017 at 4:18 PM, Mikhail Khludnev <mkhl@apache.org>
> wrote:
> >
> >> Vadim,
> >> You can create a collector which checks Scorer.getChildren()
> >> https://issues.apache.org/jira/browse/LUCENE-7628 but it's way
> >> cumbersome.
> >> I'd suggest to avoid this if it's possible. However, Elastic does
> >> something
> >> like this with named queries or so.
> >> I've told about this few years ago
> >> https://www.youtube.com/watch?v=sGVyUdNGBgw
> >>
> >> On Tue, Dec 5, 2017 at 12:36 PM, Vadim Gindin <vgindin@detectum.com>
> >> wrote:
> >>
> >> > I'm not sure here that I will be able to track somehow that different
> >> terms
> >> > were matched to the same document...
> >> >
> >> > I'm thinking more about little another way: when query scores some
> >> document
> >> > - save the query term for that document somewhere. Probably it would
> be
> >> > some map in some class SearchContext. I could write something like
> this:
> >> >
> >> > SearchContext sc = getSearchContext();                    // -  does
> >> such
> >> > search context exist in Lucene? Maybe QueryContext
> >> > sc.getDocTerms().get(docID).add(query.getTerm()));  // docTerms here
> >> is a
> >> > Map<Int, List<String>> - where the key - is a document ID and
the
> value
> >> -
> >> > is a list of terms by whom this document was matched.
> >> >
> >> > I need to save somewhere the document ID and the term matched that
> >> > document. Could somebody advise me an appropriate place?
> >> >
> >> > Regards,
> >> > Vadim Gindin
> >> >
> >> >
> >> > On Tue, Dec 5, 2017 at 12:04 PM, Vadim Gindin <vgindin@detectum.com>
> >> > wrote:
> >> >
> >> > > For example like this:
> >> > >
> >> > > BooleanQuery.Builder expected = new BooleanQuery.Builder();
> >> > >
> >> > > Query param_vendor = new BoostQuery(new ConstantScoreQuery(new
> >> > TermQuery(new Term("param_vendor", queryStr))), 5f);
> >> > > Query param_model = new BoostQuery(new ConstantScoreQuery(new
> >> > TermQuery(new Term("param_model", queryStr))), 5f);
> >> > > Query param_value = new BoostQuery(new ConstantScoreQuery(new
> >> > TermQuery(new Term("param_value", queryStr))), 3f);
> >> > > Query param_name = new BoostQuery(new ConstantScoreQuery(new
> >> > TermQuery(new Term("param_name", queryStr))), 4f);
> >> > >
> >> > > BooleanQuery bq = expected
> >> > >         .add(param_vendor, BooleanClause.Occur.SHOULD)
> >> > >         .add(param_model, BooleanClause.Occur.SHOULD)
> >> > >         .add(param_value, BooleanClause.Occur.SHOULD)
> >> > >         .add(param_name, BooleanClause.Occur.SHOULD)
> >> > >         .setMinimumNumberShouldMatch(1)
> >> > >         .build();
> >> > >
> >> > > return new BoostQuery(bq, queryBoost);
> >> > >
> >> > >
> >> > > Vadim
> >> > >
> >> > > On Tue, Dec 5, 2017 at 9:24 AM, Michael Sokolov <msokolov@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > >> Well how did you make the original query?
> >> > >>
> >> > >> On Dec 4, 2017 12:05 PM, "Vadim Gindin" <vgindin@detectum.com>
> >> wrote:
> >> > >>
> >> > >> > Yes, thanks. My question is exactly about how to create "another
> >> extra
> >> > >> > query that requires all the terms in the original query"
> >> > >> >
> >> > >> > On Mon, Dec 4, 2017 at 6:50 PM, Michael Sokolov <
> >> msokolov@gmail.com>
> >> > >> > wrote:
> >> > >> >
> >> > >> > > I'm just saying, that when you form your query, you
could also
> >> > create
> >> > >> > > another extra query that requires all the terms in the
original
> >> > query,
> >> > >> > and
> >> > >> > > then combine it with the original query in a boolean
where the
> >> > >> original
> >> > >> > > query is required and the extra query is optional. That
will
> >> give a
> >> > >> boost
> >> > >> > > when all the terms are found, although I think the scores
will
> be
> >> > >> added,
> >> > >> > > not multiplied.
> >> > >> > >
> >> > >> > > On Dec 4, 2017 5:22 AM, "Vadim Gindin" <vgindin@detectum.com>
> >> > wrote:
> >> > >> > >
> >> > >> > > > Thanks, Michael!
> >> > >> > > >
> >> > >> > > > Yes, I'm sure. Could you explain your proposal
in more
> detail?
> >> > >> > > >
> >> > >> > > > Regards,
> >> > >> > > > Vadim Gindin
> >> > >> > > >
> >> > >> > > > On Mon, Dec 4, 2017 at 3:18 PM, Michael Sokolov
<
> >> > msokolov@gmail.com
> >> > >> >
> >> > >> > > > wrote:
> >> > >> > > >
> >> > >> > > > > You could combine a Boolean and query with
the same terms,
> >> as an
> >> > >> > > optional
> >> > >> > > > > clause. Are you sure about the requirement
to multiply the
> >> score
> >> > >> in
> >> > >> > > that
> >> > >> > > > > case?
> >> > >> > > > >
> >> > >> > > > > On Dec 4, 2017 5:13 AM, "Vadim Gindin" <
> vgindin@detectum.com
> >> >
> >> > >> wrote:
> >> > >> > > > >
> >> > >> > > > > > Hi all.
> >> > >> > > > > >
> >> > >> > > > > > I need to track that all query terms
are matched in one
> >> > >> document.
> >> > >> > > When
> >> > >> > > > > all
> >> > >> > > > > > terms are matched I need to multiply
the score of such
> >> > document
> >> > >> to
> >> > >> > > some
> >> > >> > > > > > constant coefficient.
> >> > >> > > > > >
> >> > >> > > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >>
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message