hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Shivkumar <abhisheksgum...@gmail.com>
Subject Re: WEKA logistic regression on hadoop
Date Tue, 16 Oct 2012 14:07:59 GMT
As far as I know weka cannot be run on hadoop directly.
What can be done is if your algorithm first generats a model based on a training data initially,
then you can run your training offline on your laptop and serialize, i.e. write the trained
model in a file. Now, put this model file on hdfs and read it inside your setup method of
map reduce programs. 

As and when you read your input in your mapper method, you can take the trained model file
to determine any decision such as a classification or other supervised machine lerarning algorithm
decisions.

I did this for SVM and it did work.
I am interested to know if anyone else has tried any alternate method to port weka algorithms
on hadoop.

Thanks!
With Regards,
Abhishek S

On Oct 16, 2012, at 7:16 PM, Rajesh Nikam <rajeshnikam@gmail.com> wrote:

> Hi,
> 
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get could results
with my experiments.
> 
> There are logistic regression algorithms supported with WEKA which I have used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
> 
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M 6 -t lr.arff

> 
> Have anyone ported them to take advantage of hadoop ?
> 
> How to interpret the output generated from it like what is Coefficients and Odds Ratios
that could be used for classification ?
> 
> 
> Options: -R 1.0E-8 -M 6 
> 
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
> 
> 
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
> 
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
> 
> === Error on training data ===
> 
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
> 
> 
> 
> === Stratified cross-validation ===
> 
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569     
> 
> 
> === Confusion Matrix ===
> 
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
> 
> Thanks in advance.
> Rajesh 

Mime
View raw message