hivemall-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Makoto Yui <yuin...@gmail.com>
Subject Re: Random Forest to train Kaggle Titanic but the accuracy is just 0.65 which small than the user guide says 0.765
Date Thu, 13 Apr 2017 04:56:31 GMT
Hi Mars,

Could you share the training data and test data for me?

Also, code snippets used for training/test are required for reproducing
your result.

Thanks,
Makoto

2017-04-11 17:10 GMT+09:00 Mars Xu <xujiao.mycafe@gmail.com>:

> Hi users,
>
>     I build hivemall using the version 0.4.2 with spark 2.1.0, than
> running the random forest algorithm to test Kaggle titanic Tutorial(
> https://hivemall.incubator.apache.org/userguide/binaryclass/titanic_
> rf.html)
>
>     There is one point I didn’t follow the guide, in data preparation
> part, when I run this command ,
>
> awk '{ FPAT="([^,]*)|(\"[^\"]+\")";OFS="|"; } NR >1 {$1=$1;$4=substr($4,2,length($4)-2);print
$0}’ train.csv
>
>     the data is not right as below ,
>
>
> so, I just use ‘,’  as the fields delimiter.  it get the accuracy 0.655 on
> Kaggle platform.
>
> Is there anything I can do to correct this result ?
>
>
>
> Thanks so much!
> Mars.
>
>

Mime
View raw message