lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-10651) Streaming Expressions statistical functions library
Date Mon, 09 Oct 2017 23:57:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197904#comment-16197904
] 

ASF subversion and git services commented on SOLR-10651:
--------------------------------------------------------

Commit f2d1a997dd7894e414ebc4c1460a55e01f2f799a in lucene-solr's branch refs/heads/master
from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f2d1a99 ]

SOLR-10651: binomialCoefficient Stream Evaluator to CHANGES.txt


> Streaming Expressions statistical functions library
> ---------------------------------------------------
>
>                 Key: SOLR-10651
>                 URL: https://issues.apache.org/jira/browse/SOLR-10651
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: streaming expressions
>            Reporter: Joel Bernstein
>
> This is a ticket for organizing the new statistical programming features of Streaming
Expressions. It's also a place for the community to discuss what functions are needed to support
statistical programming. 
> Basic Syntax:
> {code}
> let(a = timeseries(...),
>     b = timeseries(...),
>     c = col(a, count(*)),
>     d = col(b, count(*)),
>     r = regress(c, d),
>     tuple(p = predict(r, 50)))
> {code}
> The expression above is doing the following:
> 1) The let expression is setting variables (a, b, c, d, r).
> 2) Variables *a* and *b* are the output of timeseries() Streaming Expressions. These
will be stored in memory as lists of Tuples containing the time series results.
> 3) Variables *c* and *d* are set using the *col* evaluator. The col evaluator extracts
a column of numbers from a list of tuples. In the example *col* is extracting the count\(*\)
field from the two time series result sets.
> 4) Variable *r* is the output from the *regress* evaluator. The regress evaluator performs
a simple regression analysis on two columns of numbers.
> 5) Once the variables are set, a single Streaming Expression is run by the *let* expression.
In the example the *tuple* expression is run. The tuple expression outputs a single Tuple
with name/value pairs. Any Streaming Expression can be run by the *let* expression so this
can be a complex program. The streaming expression run by *let* has access to all the variables
defined earlier.
> 6) The tuple expression in the example has one name / value pair. The name *p* is set
to the output of the *predict* evaluator. The predict evaluator is predicting the value of
a dependent variable based on the independent variable 50. The regression result stored in
variable *r* is used to make the prediction.
> 7) The output of this expression will be a single tuple with the value of the predict
function in the *p* field.
> The growing list of issues linked to this ticket are the array manipulation and statistical
functions that will form the basis of the stats library. The vast majority of these functions
are backed by algorithms in Apache Commons Math. Other machine learning and math libraries
will follow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message