lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Gindin <vgin...@detectum.com>
Subject Re: Tracking that all query terms are matched in one document
Date Thu, 14 Dec 2017 09:45:10 GMT
Thank you

On Wed, Dec 13, 2017 at 3:32 PM, Mikhail Khludnev <mkhl@apache.org> wrote:

> There are two algorithm for scoring disjunction: term-a-time, doc-at-time.
> The former was called BooleanScorer and the later was called
> BooleanScorer2.
> I remember that they was drastically renamed and/or replaced with
> BulkScorer or so. Anyway, you need to find a way to prevent term-at-time
> scoring, when FakeScorer is injected.
> You need to make it score doc-at-time. As I told you, it's far way.
>
> On Wed, Dec 13, 2017 at 11:55 AM, Vadim Gindin <vgindin@detectum.com>
> wrote:
>
> > Hi Michael,
> >
> > I've tried to implement such case but faced with the following problem. I
> > recall, that my Query is combined with several ConstantScoreQuery with
> > BooleanQuery. I wrote custom Collector as follows:
> >
> > @Override
> > public void setScorer(Scorer scorer) throws IOException {
> >     this.scorer = scorer;
> >
> > }
> >
> > @Override
> > public void collect(int doc) throws IOException {
> >     System.out.println("doc=" + doc);
> >     diveIntoScorers(this.scorer);
> > }
> >
> > and, when I'm diving recursively to child scorers I'm facing new
> > UnsupportedOperationException error. It happens because of the following
> > code in BooleanScorer:
> >
> > @Override
> > public int score(LeafCollector collector, Bits acceptDocs, int min,
> > int max) throws IOException {
> >   fakeScorer.doc = -1;
> >   collector.setScorer(fakeScorer);
> >
> > Later fakeScorer throws an Exception.
> >
> > How did you implement your similar functionality?
> > How to avoid this?
> >
> > Thanks,
> > Vadim Gindin
> >
> > On Fri, Dec 8, 2017 at 2:01 PM, Vadim Gindin <vgindin@detectum.com>
> wrote:
> >
> > > Thank's for your help. I'll try that.
> > >
> > > On Tue, Dec 5, 2017 at 4:18 PM, Mikhail Khludnev <mkhl@apache.org>
> > wrote:
> > >
> > >> Vadim,
> > >> You can create a collector which checks Scorer.getChildren()
> > >> https://issues.apache.org/jira/browse/LUCENE-7628 but it's way
> > >> cumbersome.
> > >> I'd suggest to avoid this if it's possible. However, Elastic does
> > >> something
> > >> like this with named queries or so.
> > >> I've told about this few years ago
> > >> https://www.youtube.com/watch?v=sGVyUdNGBgw
> > >>
> > >> On Tue, Dec 5, 2017 at 12:36 PM, Vadim Gindin <vgindin@detectum.com>
> > >> wrote:
> > >>
> > >> > I'm not sure here that I will be able to track somehow that
> different
> > >> terms
> > >> > were matched to the same document...
> > >> >
> > >> > I'm thinking more about little another way: when query scores some
> > >> document
> > >> > - save the query term for that document somewhere. Probably it would
> > be
> > >> > some map in some class SearchContext. I could write something like
> > this:
> > >> >
> > >> > SearchContext sc = getSearchContext();                    // -  does
> > >> such
> > >> > search context exist in Lucene? Maybe QueryContext
> > >> > sc.getDocTerms().get(docID).add(query.getTerm()));  // docTerms
> here
> > >> is a
> > >> > Map<Int, List<String>> - where the key - is a document
ID and the
> > value
> > >> -
> > >> > is a list of terms by whom this document was matched.
> > >> >
> > >> > I need to save somewhere the document ID and the term matched that
> > >> > document. Could somebody advise me an appropriate place?
> > >> >
> > >> > Regards,
> > >> > Vadim Gindin
> > >> >
> > >> >
> > >> > On Tue, Dec 5, 2017 at 12:04 PM, Vadim Gindin <vgindin@detectum.com
> >
> > >> > wrote:
> > >> >
> > >> > > For example like this:
> > >> > >
> > >> > > BooleanQuery.Builder expected = new BooleanQuery.Builder();
> > >> > >
> > >> > > Query param_vendor = new BoostQuery(new ConstantScoreQuery(new
> > >> > TermQuery(new Term("param_vendor", queryStr))), 5f);
> > >> > > Query param_model = new BoostQuery(new ConstantScoreQuery(new
> > >> > TermQuery(new Term("param_model", queryStr))), 5f);
> > >> > > Query param_value = new BoostQuery(new ConstantScoreQuery(new
> > >> > TermQuery(new Term("param_value", queryStr))), 3f);
> > >> > > Query param_name = new BoostQuery(new ConstantScoreQuery(new
> > >> > TermQuery(new Term("param_name", queryStr))), 4f);
> > >> > >
> > >> > > BooleanQuery bq = expected
> > >> > >         .add(param_vendor, BooleanClause.Occur.SHOULD)
> > >> > >         .add(param_model, BooleanClause.Occur.SHOULD)
> > >> > >         .add(param_value, BooleanClause.Occur.SHOULD)
> > >> > >         .add(param_name, BooleanClause.Occur.SHOULD)
> > >> > >         .setMinimumNumberShouldMatch(1)
> > >> > >         .build();
> > >> > >
> > >> > > return new BoostQuery(bq, queryBoost);
> > >> > >
> > >> > >
> > >> > > Vadim
> > >> > >
> > >> > > On Tue, Dec 5, 2017 at 9:24 AM, Michael Sokolov <
> msokolov@gmail.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > >> Well how did you make the original query?
> > >> > >>
> > >> > >> On Dec 4, 2017 12:05 PM, "Vadim Gindin" <vgindin@detectum.com>
> > >> wrote:
> > >> > >>
> > >> > >> > Yes, thanks. My question is exactly about how to create
> "another
> > >> extra
> > >> > >> > query that requires all the terms in the original query"
> > >> > >> >
> > >> > >> > On Mon, Dec 4, 2017 at 6:50 PM, Michael Sokolov <
> > >> msokolov@gmail.com>
> > >> > >> > wrote:
> > >> > >> >
> > >> > >> > > I'm just saying, that when you form your query,
you could
> also
> > >> > create
> > >> > >> > > another extra query that requires all the terms
in the
> original
> > >> > query,
> > >> > >> > and
> > >> > >> > > then combine it with the original query in a boolean
where
> the
> > >> > >> original
> > >> > >> > > query is required and the extra query is optional.
That will
> > >> give a
> > >> > >> boost
> > >> > >> > > when all the terms are found, although I think
the scores
> will
> > be
> > >> > >> added,
> > >> > >> > > not multiplied.
> > >> > >> > >
> > >> > >> > > On Dec 4, 2017 5:22 AM, "Vadim Gindin" <vgindin@detectum.com
> >
> > >> > wrote:
> > >> > >> > >
> > >> > >> > > > Thanks, Michael!
> > >> > >> > > >
> > >> > >> > > > Yes, I'm sure. Could you explain your proposal
in more
> > detail?
> > >> > >> > > >
> > >> > >> > > > Regards,
> > >> > >> > > > Vadim Gindin
> > >> > >> > > >
> > >> > >> > > > On Mon, Dec 4, 2017 at 3:18 PM, Michael Sokolov
<
> > >> > msokolov@gmail.com
> > >> > >> >
> > >> > >> > > > wrote:
> > >> > >> > > >
> > >> > >> > > > > You could combine a Boolean and query
with the same
> terms,
> > >> as an
> > >> > >> > > optional
> > >> > >> > > > > clause. Are you sure about the requirement
to multiply
> the
> > >> score
> > >> > >> in
> > >> > >> > > that
> > >> > >> > > > > case?
> > >> > >> > > > >
> > >> > >> > > > > On Dec 4, 2017 5:13 AM, "Vadim Gindin"
<
> > vgindin@detectum.com
> > >> >
> > >> > >> wrote:
> > >> > >> > > > >
> > >> > >> > > > > > Hi all.
> > >> > >> > > > > >
> > >> > >> > > > > > I need to track that all query terms
are matched in one
> > >> > >> document.
> > >> > >> > > When
> > >> > >> > > > > all
> > >> > >> > > > > > terms are matched I need to multiply
the score of such
> > >> > document
> > >> > >> to
> > >> > >> > > some
> > >> > >> > > > > > constant coefficient.
> > >> > >> > > > > >
> > >> > >> > > > >
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Sincerely yours
> > >> Mikhail Khludnev
> > >>
> > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message