spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <...@adatao.com>
Subject Re: LogisticRegression: Predicting continuous outcomes
Date Thu, 29 May 2014 02:46:27 GMT
Bharath, (apologies if you're already familiar with the theory): the
proposed approach may or may not be appropriate depending on the overall
transfer function in your data. In general, a single logistic regressor
cannot approximate arbitrary non-linear functions (of linear combinations
of the inputs). You can review works by, e.g., Hornik and Cybenko in the
late 80's to see if you need something more, such as a simple, one
hidden-layer neural network.

This is a good summary:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.2647&rep=rep1&type=pdf

--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Wed, May 28, 2014 at 11:18 AM, Bharath Ravi Kumar <reachbach@gmail.com>wrote:

> I'm looking to reuse the LogisticRegression model (with SGD) to predict a
> real-valued outcome variable. (I understand that logistic regression is
> generally applied to predict binary outcome, but for various reasons, this
> model suits our needs better than LinearRegression). Related to that I have
> the following questions:
>
> 1) Can the current LogisticRegression model be used as is to train based on
> binary input (i.e. explanatory) features, or is there an assumption that
> the explanatory features must be continuous?
>
> 2) I intend to reuse the current class to train a model on LabeledPoints
> where the label is a real value (and not 0 / 1). I'd like to know if
> invoking setValidateData(false) would suffice or if one must override the
> validator to achieve this.
>
> 3) I recall seeing an experimental method on the class (
>
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
> )
> that clears the threshold separating positive & negative predictions. Once
> the model is trained on real valued labels, would clearing this flag
> suffice to predict an outcome that is continous in nature?
>
> Thanks,
> Bharath
>
> P.S: I'm writing to dev@ and not user@ assuming that lib changes might be
> necessary. Apologies if the mailing list is incorrect.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message