lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Lucene 7.x custom Scorer on point values
Date Thu, 12 Oct 2017 07:27:08 GMT
Hi,

I was talking about a solely binary DocValues field. Not searchable, stored whatever. A completely
separate field that stores the values in order in binary form (e.g. 47*4 bytes if it's ints
or floats) just for scoring. DocValues fields other than numeric are binary by default!

But for _exactly_ 47 values I'd use 47 separate numeric docvalues-only fields like "value01,
value02, value03". The searchable stuff is multivlaued and just "value". But using 47 numeric
fields at scoring time is a bit much to read. Is there no possibility to combine all those
values into fewer fields, soely used for scoring (e.g, like 2 values like a linear factor
and a quadratic factor or whatever). It's hard to image that you need all values while scoring!

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Dominik Safaric [mailto:dominiksafaric@gmail.com]
> Sent: Thursday, October 12, 2017 8:53 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene 7.x custom Scorer on point values
> 
> The number of values per document per field is equal to 47.
> 
> Unfortunately using binary fields is not an option because a binary field
> is not searchable. However, using a keyword field where the array of long
> values would be equivalent to a hex encoded binary array and later
> retrieving them as binary data might do the trick. But before that, could
> you please explain how keyword fields are stored within Lucene? I'm asking
> because unfortunately I haven't found any information about it online.
> 
> Thanks,
> Dominik
> 
> 2017-10-11 13:59 GMT+02:00 Uwe Schindler <uwe@thetaphi.de>:
> 
> > Hi,
> >
> > if you have multiple docvalues for the same field in the same document,
> > the order is undefined. The original order is not preserved, sorry. How
> > many values per document do you have? If it’s a fixed number or low, I'd go
> > with single valued fields.
> >
> > If you really need multi-valued docvalues where the order is preserved,
> > you can go and use binary bytes instead and encode your values into it. But
> > this is much more expensive to use during scoring (decoding overhead,...).
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Dominik Safaric [mailto:dominiksafaric@gmail.com]
> > > Sent: Wednesday, October 11, 2017 1:39 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Lucene 7.x custom Scorer on point values
> > >
> > > Thanks Uwe for the clarification.
> > >
> > > The values are already indexed as numeric docvalues, i.e. numeric
> > > point-docvalues. In both cases, either by implementing a custom scorer or
> > > function query I would need to access the point values for the
> > matched/hit
> > > documents. How can I derive these values given a DocIdSetIterator (subset
> > > of documents i.e. hit documents ids) and a LeafContextReader. Using the
> > > getSortedNumericDocValues("field") can derive me the longs in question,
> > > however these values are sorted using Long.compare whereas in my case
> > > order
> > > of the values for a particular field matters.
> > >
> > > Kind regards,
> > > Dominik
> > >
> > > 2017-10-11 11:43 GMT+02:00 Uwe Schindler <uwe@thetaphi.de>:
> > >
> > > > Hi,
> > > >
> > > > You would need to index that as numeric docvalues. Just add another
> > field
> > > > of type numeric docvalues with same or different name and use the
> > > > LeafReader's docvalues accessors to fetch values. But that's all way
> > too
> > > > hard. You can create function queries without hazzle using the function
> > > > queries package. Or much better: I'd use the lucene expressions module
> > to
> > > > do this. It allows you to express the scoring formula as a javascript
> > > > formula and use all docvalues fields in your document to calculate the
> > > > final score.
> > > >
> > > > In both cases there is no need to create a custom scorer and everything
> > > > works efficient. Creating own scorers just for this is way to
> > complicated
> > > > and not recommended. This leads to usage errors like you have
> > discovered:
> > > > slow stored fields, misusage of docvalues APIs (those are iterators,
> > too)
> > > > or other problems.
> > > >
> > > > Uwe
> > > >
> > > > -----
> > > > Uwe Schindler
> > > > Achterdiek 19, D-28357 Bremen
> > > > http://www.thetaphi.de
> > > > eMail: uwe@thetaphi.de
> > > >
> > > > > -----Original Message-----
> > > > > From: Dominik Safaric [mailto:dominiksafaric@gmail.com]
> > > > > Sent: Wednesday, October 11, 2017 11:23 AM
> > > > > To: java-user@lucene.apache.org
> > > > > Subject: Lucene 7.x custom Scorer on point values
> > > > >
> > > > > Recently I've implemented a custom Query that in turn scores
> > documents
> > > > > using a custom Scorer implementation using a long primitive point
> > values.
> > > > > The associated field is multi valued and has doc values enabled.
For
> > > > > retrieving these multi valued longs I've used LeafReader.document()
> > > > within
> > > > > the Scorer implementation. However, the invocation requires iterating
> > > > > through the space of matching documents which may induce
> > > performance
> > > > > degradations.
> > > > >
> > > > > Hence my question is, what would be the most efficient implementation
> > > of
> > > > a
> > > > > custom Scorer that computes scores based on the value of a multi
> > valued
> > > > > long points field?
> > > > >
> > > > > Thanks in advance,
> > > > > Dominik
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message