Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Date: Mon, 11 Apr 2016 19:49:25 +0000 (UTC)
From: "Hoss Man (JIRA)" <jira@apache.org>
To: dev@lucene.apache.org
Message-ID: <JIRA.12745023.1412102300000.197070.1460404165577@Atlassian.JIRA>
In-Reply-To: <JIRA.12745023.1412102300000@Atlassian.JIRA>
References: <JIRA.12745023.1412102300000@Atlassian.JIRA>
 <JIRA.12745023.1412102300535@arcas>
Subject: [jira] [Commented] (SOLR-6575) ValueSources/FunctionValues should
 be able to (dynamically) indicate their prefered data type (propogating up)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/SOLR-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235829#comment-15235829 ] 

Hoss Man commented on SOLR-6575:
--------------------------------

I started writting this as a side comment in LUCENE-5325, before i remembered this issue already existed - so i'm posting it here and then i'll cross link...

----

Currently...

* ValueSource (via FunctionValues) expose various type specific accessors (boolVal, byteVal, doubleVal, floatVal, strVal, etc...) which can be used by callers who care about recieving a specific type -- and in which case the ValueSource is expected to "do it's best" to return whatever info it models as best it can in that type (typically a simple cast). 
* in practice, almost every "ValueSource wrapper" i can think of basically ignores the "requested type" when it's FunctionValues are used, and typically just uses doubleVal from the wrapped ValueSource/FunctionValues, and then does i's own simple cast. (see DualFloatFunction)
* one special case FunctionValues method is "objectVal" in which case the FunctionValues/ValueSource gets to make it's own decision about the type of object to "pass back" based on what makes the most sense given the source of the underlying ValueSource.  (ex: the DocValues type for fields, int for NumDocsValueSource, etc..)

Ideally...

* there should be a way to pass the "native typing" of a ValueSource *up* the stack, and a way to pass "type preference" down the stack.
* If you wrap a "math function" around 2 arbitrary ValueSources, but you don't have a preference about the specific type of the results (ex: a solr user has asked for {{product(fieldA,fieldB)}} -- or someone has created a similar looking Expression object in the java API) the resulting math operations done at the FunctionValues level should look at the "native typing" nformation "passed up" from the wrapped ValueSources to decide what data types to use, and what to return by default from things like the objectVal method, and what "native typing" to in turn pass up to it's own caller
** example: if you wrap 2 IntDocValues field ValueSource in a MathMultiplyValueSource maybe the "native result" should be a ValueSource that defaults to reutrning LongValues 
* If callers want to force the result to be an explict type - they should still be able to do that -- either themselves, or based on the choice of method they call (ie: the current FunctionValues methods like intVal, floatVal, etc...) w/o that preference automatically propogating down
** example: calling floatVal on a MathMultiplyValueSource that wraps 2 ints should be akin to: {{long result = intval1 * intval2; return (float) result;}} not {{return ((float)intval1) * ((float)intval2)}}
* we should have ValueSource wrappers that can act as "numeric casts" for folks who explicitly want to inject a type preference at arbitrary places in the hierarchy.
** example: if you prefer to use floating point multiplication on two ValueSources, regardless of what "native type" of those ValueSources are, you can wrap each of them in a "CastAsFloatValueSource" and then wrap all of those in your MathMultiplyValueSource.

----

There's probably a great way to accomplish much of this very naturally if we start moving towards a more type save ValueSource API utilizing generics better (allthough i'm not really sure how it we want to keep optimizing for primitive types like int/float/long/double instead Integer/Float/Long/Double) but as a straw man to try and try and clarify what i'm talking about...

Imagine adding the following to the existing ValueSource APIs..

* imagine if we add a {{Class getNativeClassValue()}} method to ValueSource, documented as always returning the same Class as you would get from any call to {{FunctionValues.objectVal(int).getClass()}} when using this ValueSource directly
* for Math based wrapper ValueSources, the type info from the getNativeClassValue() methods of the ValueSources they wrap sould be used to decide what FunctionValues impls to return (ie: MathMultiplyValueSource could do check the getNativeClassValue() of each VS it wrapped to find the least common denominator in the types to decide when to use something like MultiplyLongValueSource, or MultipleDoubleValueSource)
* new classes like CastFloatValueSource classes would be ValueSource wrappers that completely ignore the getNativeClassValue() of the VS they wrap. They would instead implement getNativeClassValue to return a constant Class (ex: Float), and would use a FunctionValues impl whose objectVal (and other methods) would just call the method that matches the cast they are suppose to do on the class they wrap -- ie: {code}Object objectVal(int doc) { return new Float(floatVal(doc)); }
int intVal(int doc) { return (int) floatVal(int doc); }
float floatVal(int doc) { return inner.floatVal(int doc); }
...{code}


> ValueSources/FunctionValues should be able to (dynamically) indicate their prefered data type (propogating up)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6575
>                 URL: https://issues.apache.org/jira/browse/SOLR-6575
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Hoss Man
>
> Something i've been thinking about for a while, but SOLR-6562 recently goaded me into opening a jira for...
> The ValueSource/FunctionValues API is designed to work with different levels of math precision (int, long, float, double, date, etc...) and the FunctionValues.objectVal() method provides a generic way to fetch an arbitrary type from any FunctionValues instance -- which can be in the "preferred" type for a given ValueSource can be retrieved (ie: an "Integer" if the ValueSource corrisponds to the DocValues of an int field).
> But for ValueSources thta wrap other value sources (ie: implementing math functions like "sum" or "product" there is no easy way at runtime to know which of the underlying methods on the FunctionValues is the "best" one to call.  It would be helpful if FunctionValues or ValueSource had some type of method on it (ie: "canonicalDataType()" that could return some enumeration value inidacting which of the low level various methods (intValue(docid), floatValue(docid), etc...) were best suited for the data it represents.
> Straw man idea...
> For the lowest level ValueTypes coming from DocValues, these methods could return a constant -- but for things like "SumValueSource" "canonicalDataType()" could be recursive -- returning the least common denominator of the ValueSources it wraps. the corrisponding intValue() and floatValue() methods in that class could then cast appopriately.  
> So even if you have SumValueSource wrapped arround several IntDocValuesSource, SumValueSource.canonicalDataType() would return "INT" and if you called SumValueSource's FunctionValues.intValue(docid) it would add up the results of the intValues() methods on all of the wrapped FunctionValues -- but floatValues(docid) would/could still add up the results of the floatValue(docid) results from all of the wrapped FunctionValues (for people who want to coerce float based math -- ie: SOLR-6574)


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org