[ https://issues.apache.org/jira/browse/SOLR12197?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel
]
Joel Bernstein updated SOLR12197:

Description:
Currently the *train* Streaming Expression trains a logistic regression model by iterating
over the entire distributed training set on each training iteration. Each training iteration
involves building a matrix on each shard with the number of rows equal to the size of the
training set contained on the shard. The number of columns will be the number of features.
This scenario can create very large matrices when working with large training sets and feature
sets.
This ticket will add a *sample* parameter which will limit the size of the training set on
each iteration to a random sample of the training set. This will allow for much larger training
sets.
was:
Currently the *train* Streaming Expression trains a logistic regression model by iterating
over the entire distributed training set on each pass. Each iteration involves building a
matrix on each shard with the number of rows equal to the size of the training set contained
on the shard. The number of columns will be the number of features. This scenario can create
very large matrices when working with large training sets and feature sets.
This ticket will add a *sample* parameter which will limit the size of the training set on
each iteration to a random sample of the training set. This will allow for much larger training
sets.
> Implement sampling for logistic regression classifier
> 
>
> Key: SOLR12197
> URL: https://issues.apache.org/jira/browse/SOLR12197
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: streaming expressions
> Reporter: Joel Bernstein
> Assignee: Joel Bernstein
> Priority: Major
> Fix For: 7.4
>
>
> Currently the *train* Streaming Expression trains a logistic regression model by iterating
over the entire distributed training set on each training iteration. Each training iteration
involves building a matrix on each shard with the number of rows equal to the size of the
training set contained on the shard. The number of columns will be the number of features.
This scenario can create very large matrices when working with large training sets and feature
sets.
> This ticket will add a *sample* parameter which will limit the size of the training
set on each iteration to a random sample of the training set. This will allow for much larger
training sets.

This message was sent by Atlassian JIRA
(v7.6.3#76005)

To unsubscribe, email: devunsubscribe@lucene.apache.org
For additional commands, email: devhelp@lucene.apache.org
