lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominik Safaric <dominiksafa...@gmail.com>
Subject Re: Lucene 7.x custom Scorer on point values
Date Thu, 12 Oct 2017 06:52:58 GMT
The number of values per document per field is equal to 47.

Unfortunately using binary fields is not an option because a binary field
is not searchable. However, using a keyword field where the array of long
values would be equivalent to a hex encoded binary array and later
retrieving them as binary data might do the trick. But before that, could
you please explain how keyword fields are stored within Lucene? I'm asking
because unfortunately I haven't found any information about it online.

Thanks,
Dominik

2017-10-11 13:59 GMT+02:00 Uwe Schindler <uwe@thetaphi.de>:

> Hi,
>
> if you have multiple docvalues for the same field in the same document,
> the order is undefined. The original order is not preserved, sorry. How
> many values per document do you have? If it’s a fixed number or low, I'd go
> with single valued fields.
>
> If you really need multi-valued docvalues where the order is preserved,
> you can go and use binary bytes instead and encode your values into it. But
> this is much more expensive to use during scoring (decoding overhead,...).
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Dominik Safaric [mailto:dominiksafaric@gmail.com]
> > Sent: Wednesday, October 11, 2017 1:39 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene 7.x custom Scorer on point values
> >
> > Thanks Uwe for the clarification.
> >
> > The values are already indexed as numeric docvalues, i.e. numeric
> > point-docvalues. In both cases, either by implementing a custom scorer or
> > function query I would need to access the point values for the
> matched/hit
> > documents. How can I derive these values given a DocIdSetIterator (subset
> > of documents i.e. hit documents ids) and a LeafContextReader. Using the
> > getSortedNumericDocValues("field") can derive me the longs in question,
> > however these values are sorted using Long.compare whereas in my case
> > order
> > of the values for a particular field matters.
> >
> > Kind regards,
> > Dominik
> >
> > 2017-10-11 11:43 GMT+02:00 Uwe Schindler <uwe@thetaphi.de>:
> >
> > > Hi,
> > >
> > > You would need to index that as numeric docvalues. Just add another
> field
> > > of type numeric docvalues with same or different name and use the
> > > LeafReader's docvalues accessors to fetch values. But that's all way
> too
> > > hard. You can create function queries without hazzle using the function
> > > queries package. Or much better: I'd use the lucene expressions module
> to
> > > do this. It allows you to express the scoring formula as a javascript
> > > formula and use all docvalues fields in your document to calculate the
> > > final score.
> > >
> > > In both cases there is no need to create a custom scorer and everything
> > > works efficient. Creating own scorers just for this is way to
> complicated
> > > and not recommended. This leads to usage errors like you have
> discovered:
> > > slow stored fields, misusage of docvalues APIs (those are iterators,
> too)
> > > or other problems.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > Achterdiek 19, D-28357 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > > > -----Original Message-----
> > > > From: Dominik Safaric [mailto:dominiksafaric@gmail.com]
> > > > Sent: Wednesday, October 11, 2017 11:23 AM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Lucene 7.x custom Scorer on point values
> > > >
> > > > Recently I've implemented a custom Query that in turn scores
> documents
> > > > using a custom Scorer implementation using a long primitive point
> values.
> > > > The associated field is multi valued and has doc values enabled. For
> > > > retrieving these multi valued longs I've used LeafReader.document()
> > > within
> > > > the Scorer implementation. However, the invocation requires iterating
> > > > through the space of matching documents which may induce
> > performance
> > > > degradations.
> > > >
> > > > Hence my question is, what would be the most efficient implementation
> > of
> > > a
> > > > custom Scorer that computes scores based on the value of a multi
> valued
> > > > long points field?
> > > >
> > > > Thanks in advance,
> > > > Dominik
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message