spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Chaddha <rohitchaddha1...@gmail.com>
Subject Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data
Date Mon, 25 Jul 2016 09:40:06 GMT
Hi Krishna,

Great .. I had no idea about this.  I tried your suggestion by using
na.drop() and got a rmse = 1.5794048211812495
Any suggestions how this can be reduced and the model improved ?

Regards,
Rohit

On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar <ksankar42@gmail.com> wrote:

> Thanks Nick. I also ran into this issue.
> VG, One workaround is to drop the NaN from predictions (df.na.drop()) and
> then use the dataset for the evaluator. In real life, probably detect the
> NaN and recommend most popular on some window.
> HTH.
> Cheers
> <k/>
>
> On Sun, Jul 24, 2016 at 12:49 PM, Nick Pentreath <nick.pentreath@gmail.com
> > wrote:
>
>> It seems likely that you're running into
>> https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when the
>> test dataset in the train/test split contains users or items that were not
>> in the training set. Hence the model doesn't have computed factors for
>> those ids, and ALS 'transform' currently returns NaN for those ids. This in
>> turn results in NaN for the evaluator result.
>>
>> I have a PR open on that issue that will hopefully address this soon.
>>
>>
>> On Sun, 24 Jul 2016 at 17:49 VG <vlinked@gmail.com> wrote:
>>
>>> ping. Anyone has some suggestions/advice for me .
>>> It will be really helpful.
>>>
>>> VG
>>> On Sun, Jul 24, 2016 at 12:19 AM, VG <vlinked@gmail.com> wrote:
>>>
>>>> Sean,
>>>>
>>>> I did this just to test the model. When I do a split of my data as
>>>> training to 80% and test to be 20%
>>>>
>>>> I get a Root-mean-square error = NaN
>>>>
>>>> So I am wondering where I might be going wrong
>>>>
>>>> Regards,
>>>> VG
>>>>
>>>> On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen <sowen@cloudera.com> wrote:
>>>>
>>>>> No, that's certainly not to be expected. ALS works by computing a much
>>>>> lower-rank representation of the input. It would not reproduce the
>>>>> input exactly, and you don't want it to -- this would be seriously
>>>>> overfit. This is why in general you don't evaluate a model on the
>>>>> training set.
>>>>>
>>>>> On Sat, Jul 23, 2016 at 7:37 PM, VG <vlinked@gmail.com> wrote:
>>>>> > I am trying to run ml.ALS to compute some recommendations.
>>>>> >
>>>>> > Just to test I am using the same dataset for training using ALSModel
>>>>> and for
>>>>> > predicting the results based on the model .
>>>>> >
>>>>> > When I evaluate the result using RegressionEvaluator I get a
>>>>> > Root-mean-square error = 1.5544064263236066
>>>>> >
>>>>> > I thin this should be 0. Any suggestions what might be going wrong.
>>>>> >
>>>>> > Regards,
>>>>> > Vipul
>>>>>
>>>>
>>>>
>

Mime
View raw message