systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niketan Pansare" <npan...@us.ibm.com>
Subject Re: Using GLM-predict
Date Tue, 08 Dec 2015 22:08:27 GMT

Hi Sourav,

For some reason, I didn't get your email on "Tue, 08 Dec 2015 12:56:38
-0800 " (which I noticed in the archive).

>> Not sure how exactly I can modify the GLM-predict.dml to get some
prediction to start with.
There are two options here:
1. Modify GLM-predict.dml as suggested by Shirish (better approach with
respect to the SystemML optimizer) or

2. Run a new script on the output of GLM-predict. Please see:
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/ml/LogisticRegressionModel.java#L163
If you chose to go with option 2, you might also want to read the
documentation of following two built-in functions:
a. rowIndexMax (See
http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-andor-scalar-comparison-built-in-functions
)
b. ppred

>> Can you give me some idea how from here I can calculate the predicted
value of the label using some value of probability threshold ?
Very simple way to predict the label given probability matrix:
Prediction = rowIndexMax(Prob)  # predicts the label with highest
probability. This assumes one-based labels.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	Shirish Tatikonda <shirish.tatikonda@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	12/08/2015 12:49 PM
Subject:	Re: Using GLM-predict



Hi Sourav,

Yes, GLM-predict.dml gives out only the probabilities. You can put a
threshold on the resulting probabilities to get the actual class labels --
for example, prob > 0.5 is positive and <=0.5 as negative.

The exact value of threshold typically depends on the data and the
application. Different thresholds yield different classifiers with
different performance (precision, recall, etc.). You can find the best
threshold for the given data set by finding a value that gives the desired
classifier performance (for example, a threshold that gives roughly equal
precision and recall). Such an optimization is obviously done during the
training phase using a held out test set.

If you wish, you can also modify the DML script to perform this entire
process.

Shirish


On Tue, Dec 8, 2015 at 12:23 PM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> Hi,
>
> I have used GLM.dml to create a model using some sample data. It returns
to
> me the matrix of Beta, B.
>
> Now I want to use this matrix of Beta on a new set of data points and
> generate predicted value of the dependent variable/observation.
>
> When I checked GLM-predict, I could see that one can pass feature vector
> for the new data set and also the matrix of beta.
>
> But I could not see any way to get the predicted value of the dependent
> variable/observation. The output parameter only supports matrix of
> predicted means/probabilities.
>
> Is there a way one can get the predicted value of the dependent
> variable/observation from GLM-predict ?
>
> Regards,
> Sourav
>


Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message