# lucene-dev mailing list archives

##### Site index · List index
Message view
Top
From "Cao Manh Dat (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-8492) Add LogisticRegressionQuery and LogitStream
Date Wed, 06 Jan 2016 23:38:39 GMT
```
[ https://issues.apache.org/jira/browse/SOLR-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086500#comment-15086500
]

Cao Manh Dat edited comment on SOLR-8492 at 1/6/16 11:38 PM:
-------------------------------------------------------------

Gradient Ascent find maximum for a function. Beside, the gradient ascent formula is kinda
like this
{code}
wi = wi + alpha*(sigmoid-outcome) * xi
{code}

In this case, we wanna find minimum for the error function. So gradient descent is correct?
I also check the output in test case
{code}
Double[] testRecord = {0.0, 0.0, 0.0, 1.0, 1.0, 1.0};
Double[] testWeights = bestWeights.toArray(new Double[bestWeights.size()]);

double d = sum(multiply(testRecord, testWeights));
double prob = sigmoid(d); // *prob = 0.999 which is correct (outcome = 1)*

Double[] testRecord2 = {1.0, 1.0, 1.0, 0.0, 0.0, 0.0};

d = sum(multiply(testRecord2, testWeights));
prob = sigmoid(d); // *prob = 0.5 which is not correct (outcome = 0)*
{code}

was (Author: caomanhdat):
Gradient Ascent find maximum for a function. Beside, the gradient ascent formula is kinda
like this
{wi = wi + alpha*(sigmoid-outcome) * xi}

In this case, we wanna find minimum for the error function. So gradient descent is correct?

> Add LogisticRegressionQuery and LogitStream
> -------------------------------------------
>
>                 Key: SOLR-8492
>                 URL: https://issues.apache.org/jira/browse/SOLR-8492
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joel Bernstein
>         Attachments: SOLR-8492.patch, SOLR-8492.patch, SOLR-8492.patch
>
>
> This ticket is to add a new query called a LogisticRegressionQuery (LRQ).
> The LRQ extends AnalyticsQuery (http://joelsolr.blogspot.com/2015/12/understanding-solrs-analyticsquery.html)
and returns a DelegatingCollector that implements a Stochastic Gradient Descent (SGD) optimizer
for Logistic Regression.
> This ticket also adds the LogitStream which leverages Streaming Expressions to provide
iteration over the shards. Each call to LogitStream.read() calls down to the shards and executes
the LogisticRegressionQuery. The model data is collected from the shards and the weights are
averaged and sent back to the shards with the next iteration. Each call to read() returns
a Tuple with the averaged weights and error from the shards. With this approach the LogitStream
streams the changing model back to the client after each iteration.
> The LogitStream will return the EOF Tuple when it reaches the defined maxIterations.
When sent as a Streaming Expression to the Stream handler this provides parallel iterative
behavior. This same approach can be used to implement other parallel iterative algorithms.
> The initial patch has  a test which simply tests the mechanics of the iteration. More
work will need to be done to ensure the SGD is properly implemented. The distributed approach
of the SGD will also need to be reviewed.
> This implementation is designed for use cases with a small number of features because
each feature is it's own discreet field.
> An implementation which supports a higher number of features would be possible by packing
features into a byte array and storing as binary DocValues.
> This implementation is designed to support a large sample set. With a large number of
shards, a sample set into the billions may be possible.
> sample Streaming Expression Syntax:
> {code}
> logit(collection1, features="a,b,c,d,e,f" outcome="x" maxIterations="80")
> {code}

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

```
Mime
View raw message