systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sourav Mazumder <sourav.mazumde...@gmail.com>
Subject Re: Using GLM-predict
Date Wed, 09 Dec 2015 05:40:29 GMT
Hi Niketan,

Thanks again for the detailed inputs.

Some more follow up Qs -

1. In the GLM-predict.dml I could see 'means' is the output variable. In my
understanding it is same as the probability matrix u have mentioned in your
mail (to be used to compute the prediction). Am I right ?

2. From GLM.dml I get the 'betas' as output using
outputs.getBinaryBlockedRDD("beta_out"). The same I pass to GLM-predict.dml
as B. For registering B following statements are used
val beta = outputs.getBinaryBlockedRDD("beta_out")
ml.registerInput("B", beta, 1, 4) // I have four feature vectors so I get 4
coefficients

However, when I execute GLM-predict.dml I get following error.

val outputs =
ml.execute("/home/system-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml",
cmdLineParams)

15/12/09 05:32:47 WARN Expression: Metadata file:  .mtd not provided
15/12/09 05:32:47 ERROR Expression: ERROR:
/home/system-ml-0.9.0-SNAPSHOT/algori
thms/GLM-predict.dml -- line 117, column 8 -- Missing or incomplete dimensio
n information in read statement:  .mtd
com.ibm.bi.dml.parser.LanguageException: Invalid Parameters : ERROR:
/home/syste
m-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml -- line 117, column 8 -- Miss
ing or incomplete dimension information in read statement:  .mtd

In line 117 we have following statement : X = read (fileX);

3. Say I get back prediction matrix as an output (from predictions =
rowIndexMax(means);). Now can I read add that as a column to my original
data frame (the one from which I created the feature vector for the
original model) ? My concern is whether adding back will ensure the right
order so that teh key for the feature vector and the predicted value remain
same ? If not how to achieve the same ?

Regards,
Sourav





On Tue, Dec 8, 2015 at 2:08 PM, Niketan Pansare <npansar@us.ibm.com> wrote:

> Hi Sourav,
>
> For some reason, I didn't get your email on "*Tue, 08 Dec 2015 12:56:38
> -0800*
> <https://www.mail-archive.com/search?l=dev@systemml.incubator.apache.org&q=date:20151208>
"
> (which I noticed in the archive).
>
> >> Not sure how exactly I can modify the GLM-predict.dml to get some
> prediction to start with.
> There are two options here:
> 1. Modify GLM-predict.dml as suggested by Shirish (better approach with
> respect to the SystemML optimizer) or
>
> 2. Run a new script on the output of GLM-predict. Please see:
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/ml/LogisticRegressionModel.java#L163
> If you chose to go with option 2, you might also want to read the
> documentation of following two built-in functions:
> a. rowIndexMax (See
> http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-andor-scalar-comparison-built-in-functions
> <http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-andor-scalar-comparison-built-in-functions>
> )
> b. ppred
>
> >> Can you give me some idea how from here I can calculate the predicted
> value of the label using some value of probability threshold ?
> Very simple way to predict the label given probability matrix:
> Prediction = rowIndexMax(Prob) # predicts the label with highest
> probability. This assumes one-based labels.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Shirish Tatikonda ---12/08/2015 12:49:47
> PM---Hi Sourav, Yes, GLM-predict.dml gives out only the prob]Shirish
> Tatikonda ---12/08/2015 12:49:47 PM---Hi Sourav, Yes, GLM-predict.dml gives
> out only the probabilities. You can put a
>
> From: Shirish Tatikonda <shirish.tatikonda@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 12/08/2015 12:49 PM
> Subject: Re: Using GLM-predict
> ------------------------------
>
>
>
> Hi Sourav,
>
> Yes, GLM-predict.dml gives out only the probabilities. You can put a
> threshold on the resulting probabilities to get the actual class labels --
> for example, prob > 0.5 is positive and <=0.5 as negative.
>
> The exact value of threshold typically depends on the data and the
> application. Different thresholds yield different classifiers with
> different performance (precision, recall, etc.). You can find the best
> threshold for the given data set by finding a value that gives the desired
> classifier performance (for example, a threshold that gives roughly equal
> precision and recall). Such an optimization is obviously done during the
> training phase using a held out test set.
>
> If you wish, you can also modify the DML script to perform this entire
> process.
>
> Shirish
>
>
> On Tue, Dec 8, 2015 at 12:23 PM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
> > Hi,
> >
> > I have used GLM.dml to create a model using some sample data. It returns
> to
> > me the matrix of Beta, B.
> >
> > Now I want to use this matrix of Beta on a new set of data points and
> > generate predicted value of the dependent variable/observation.
> >
> > When I checked GLM-predict, I could see that one can pass feature vector
> > for the new data set and also the matrix of beta.
> >
> > But I could not see any way to get the predicted value of the dependent
> > variable/observation. The output parameter only supports matrix of
> > predicted means/probabilities.
> >
> > Is there a way one can get the predicted value of the dependent
> > variable/observation from GLM-predict ?
> >
> > Regards,
> > Sourav
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message