lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@gmail.com>
Subject Re: Payload TFIDF Similarity in Lucene 7.1.0
Date Wed, 14 Mar 2018 09:48:42 GMT
Yes that (LUCENE-7854) was what I was referring to, and you are right that
it stores values as integers. This doesn't necessarily have to be a
blocker; you could scale your values by some factor, I guess.

On Mar 13, 2018 9:36 AM, "Erdan Genc" <erdan.genc@googlemail.com> wrote:

> @Erik: I didn't know that, how can I figure out which query types support
> payload scoring? The class I described is wrapped into an elasticsearch
> plugin so I don't have full control over this. Currently I'm using the
> SpanTermQuery, maybe another available query type will do, so I don't need
> to implement a custom query parser as well. Thank you!
>
> @Michael: This was my first thought as well but I couldn't find any
> resources when I first searched for it. I just discovered LUCENE-7854
> <https://issues.apache.org/jira/browse/LUCENE-7854>, the
> DelimitedTermFrequencyTokenFilter, but it can't handle floating values
> right? Thanks!
>
> 2018-03-13 12:14 GMT+01:00 Michael Sokolov <msokolov@gmail.com>:
>
> > Also, if you are no longer using the term frequency at all, you might
> > consider wiring your score (the one you are currently wiring into
> payloads)
> > in there, in place of the term frequency.
> >
> > On Mar 13, 2018 6:57 AM, "Erik Hatcher" <erik.hatcher@gmail.com> wrote:
> >
> > > Payloads are only scored from certain query types.   What query are you
> > > executing?
> > >
> > > > On Mar 13, 2018, at 04:58, Grdan Eenc <erdan.genc@googlemail.com>
> > wrote:
> > > >
> > > > Hej there,
> > > >
> > > > I want to extend the TFIDF Similarity class such that the term
> > frequency
> > > is
> > > > neglected and the value in the payload used instead. Therefore I
> > > basically
> > > > do this:
> > > >
> > > >    @Override
> > > >    public float tf(float freq) {
> > > >        return 1f;
> > > >    }
> > > >
> > > >    public float scorePayload(int doc, int start, int end, BytesRef
> > > > payload) {
> > > >        if (payload != null) {
> > > >            return PayloadHelper.decodeFloat(payload.bytes,
> > > payload.offset);
> > > >        } else {
> > > >            return 1f;
> > > >        }
> > > >    }
> > > >
> > > > Complete class can be found here:
> > > >
> > > > https://gist.github.com/nadre/66be2a2a32214f2c5ec1ec1f6edcef08
> > > >
> > > > Unfortunately the scorePayload never gets called and I end up with
> the
> > > > wrong scoring. I know that scorePayload is deprecated in Lucene 7.2.1
> > but
> > > > it should work in 7.1.0 or am I missing something?
> > > >
> > > > I implemented the same thing by directly extending the basic
> Similarity
> > > > class and iterating through doc terms using the LeafReaderContext,
> > based
> > > on
> > > > the code in this repo:
> > > >
> > > > https://github.com/sdauletau/elasticsearch-position-similarity
> > > >
> > > > This works but is horribly slow which is why I would prefer the first
> > > idea.
> > > >
> > > > Any idea why scorePayload doesn't get called? I really couldn't find
> > any
> > > > resources on the net.
> > > >
> > > > Best, Erdan.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message