lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] [Commented] (SOLR-6575) ValueSources/FunctionValues should be able to (dynamically) indicate their prefered data type (propogating up)
Date Mon, 11 Apr 2016 19:49:25 GMT


Hoss Man commented on SOLR-6575:

I started writting this as a side comment in LUCENE-5325, before i remembered this issue already
existed - so i'm posting it here and then i'll cross link...



* ValueSource (via FunctionValues) expose various type specific accessors (boolVal, byteVal,
doubleVal, floatVal, strVal, etc...) which can be used by callers who care about recieving
a specific type -- and in which case the ValueSource is expected to "do it's best" to return
whatever info it models as best it can in that type (typically a simple cast). 
* in practice, almost every "ValueSource wrapper" i can think of basically ignores the "requested
type" when it's FunctionValues are used, and typically just uses doubleVal from the wrapped
ValueSource/FunctionValues, and then does i's own simple cast. (see DualFloatFunction)
* one special case FunctionValues method is "objectVal" in which case the FunctionValues/ValueSource
gets to make it's own decision about the type of object to "pass back" based on what makes
the most sense given the source of the underlying ValueSource.  (ex: the DocValues type for
fields, int for NumDocsValueSource, etc..)


* there should be a way to pass the "native typing" of a ValueSource *up* the stack, and a
way to pass "type preference" down the stack.
* If you wrap a "math function" around 2 arbitrary ValueSources, but you don't have a preference
about the specific type of the results (ex: a solr user has asked for {{product(fieldA,fieldB)}}
-- or someone has created a similar looking Expression object in the java API) the resulting
math operations done at the FunctionValues level should look at the "native typing" nformation
"passed up" from the wrapped ValueSources to decide what data types to use, and what to return
by default from things like the objectVal method, and what "native typing" to in turn pass
up to it's own caller
** example: if you wrap 2 IntDocValues field ValueSource in a MathMultiplyValueSource maybe
the "native result" should be a ValueSource that defaults to reutrning LongValues 
* If callers want to force the result to be an explict type - they should still be able to
do that -- either themselves, or based on the choice of method they call (ie: the current
FunctionValues methods like intVal, floatVal, etc...) w/o that preference automatically propogating
** example: calling floatVal on a MathMultiplyValueSource that wraps 2 ints should be akin
to: {{long result = intval1 * intval2; return (float) result;}} not {{return ((float)intval1)
* ((float)intval2)}}
* we should have ValueSource wrappers that can act as "numeric casts" for folks who explicitly
want to inject a type preference at arbitrary places in the hierarchy.
** example: if you prefer to use floating point multiplication on two ValueSources, regardless
of what "native type" of those ValueSources are, you can wrap each of them in a "CastAsFloatValueSource"
and then wrap all of those in your MathMultiplyValueSource.


There's probably a great way to accomplish much of this very naturally if we start moving
towards a more type save ValueSource API utilizing generics better (allthough i'm not really
sure how it we want to keep optimizing for primitive types like int/float/long/double instead
Integer/Float/Long/Double) but as a straw man to try and try and clarify what i'm talking

Imagine adding the following to the existing ValueSource APIs..

* imagine if we add a {{Class getNativeClassValue()}} method to ValueSource, documented as
always returning the same Class as you would get from any call to {{FunctionValues.objectVal(int).getClass()}}
when using this ValueSource directly
* for Math based wrapper ValueSources, the type info from the getNativeClassValue() methods
of the ValueSources they wrap sould be used to decide what FunctionValues impls to return
(ie: MathMultiplyValueSource could do check the getNativeClassValue() of each VS it wrapped
to find the least common denominator in the types to decide when to use something like MultiplyLongValueSource,
or MultipleDoubleValueSource)
* new classes like CastFloatValueSource classes would be ValueSource wrappers that completely
ignore the getNativeClassValue() of the VS they wrap. They would instead implement getNativeClassValue
to return a constant Class (ex: Float), and would use a FunctionValues impl whose objectVal
(and other methods) would just call the method that matches the cast they are suppose to do
on the class they wrap -- ie: {code}Object objectVal(int doc) { return new Float(floatVal(doc));
int intVal(int doc) { return (int) floatVal(int doc); }
float floatVal(int doc) { return inner.floatVal(int doc); }

> ValueSources/FunctionValues should be able to (dynamically) indicate their prefered data
type (propogating up)
> --------------------------------------------------------------------------------------------------------------
>                 Key: SOLR-6575
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Hoss Man
> Something i've been thinking about for a while, but SOLR-6562 recently goaded me into
opening a jira for...
> The ValueSource/FunctionValues API is designed to work with different levels of math
precision (int, long, float, double, date, etc...) and the FunctionValues.objectVal() method
provides a generic way to fetch an arbitrary type from any FunctionValues instance -- which
can be in the "preferred" type for a given ValueSource can be retrieved (ie: an "Integer"
if the ValueSource corrisponds to the DocValues of an int field).
> But for ValueSources thta wrap other value sources (ie: implementing math functions like
"sum" or "product" there is no easy way at runtime to know which of the underlying methods
on the FunctionValues is the "best" one to call.  It would be helpful if FunctionValues or
ValueSource had some type of method on it (ie: "canonicalDataType()" that could return some
enumeration value inidacting which of the low level various methods (intValue(docid), floatValue(docid),
etc...) were best suited for the data it represents.
> Straw man idea...
> For the lowest level ValueTypes coming from DocValues, these methods could return a constant
-- but for things like "SumValueSource" "canonicalDataType()" could be recursive -- returning
the least common denominator of the ValueSources it wraps. the corrisponding intValue() and
floatValue() methods in that class could then cast appopriately.  
> So even if you have SumValueSource wrapped arround several IntDocValuesSource, SumValueSource.canonicalDataType()
would return "INT" and if you called SumValueSource's FunctionValues.intValue(docid) it would
add up the results of the intValues() methods on all of the wrapped FunctionValues -- but
floatValues(docid) would/could still add up the results of the floatValue(docid) results from
all of the wrapped FunctionValues (for people who want to coerce float based math -- ie: SOLR-6574)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message