Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 17272DA3F for ; Tue, 16 Oct 2012 14:04:30 +0000 (UTC) Received: (qmail 34575 invoked by uid 500); 16 Oct 2012 14:04:25 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 34430 invoked by uid 500); 16 Oct 2012 14:04:25 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 34423 invoked by uid 99); 16 Oct 2012 14:04:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Oct 2012 14:04:25 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of abhisheksgumadi@gmail.com designates 209.85.220.48 as permitted sender) Received: from [209.85.220.48] (HELO mail-pa0-f48.google.com) (209.85.220.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Oct 2012 14:04:15 +0000 Received: by mail-pa0-f48.google.com with SMTP id kp12so7357688pab.35 for ; Tue, 16 Oct 2012 07:03:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:to:content-transfer-encoding:mime-version; bh=9VgG21IhkkFs3VVBMHy5pX07S7mGCOpfIdH/tMD4GYg=; b=EuRBLVkC5G51llPKkVDxChe07jl81muuCOPONzet93o/PLurXsv06R4hWQHxznmpuN NVgVYsnVyE66HYdxIgwY/+HURqYG4ljwDv7+t6fZ/wOVZ5P60mq1R+QJ31ZBee9H9MtV a80GqJSau7kSkUnlxp9EI14s/VLUFjLYr/DZoeZYMrCZQWpG9593iqKSJM4i0yKvpkhi fAi0s+GLB1n7ewP2hwhHLHKOHGObOH7jfB1mZDD/x1Vws29phsmyKXIvABp//LKCJhQU bf6VgpoIQYzla/MMofwpMKTf5HjlTDVAjjhOcAAxy/SjHbiiFegimtRogJqn/RObv5YZ VTZQ== Received: by 10.68.224.69 with SMTP id ra5mr46917431pbc.114.1350396234511; Tue, 16 Oct 2012 07:03:54 -0700 (PDT) Received: from [192.168.1.5] ([115.118.169.160]) by mx.google.com with ESMTPS id ay5sm8631463pab.1.2012.10.16.07.03.51 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 16 Oct 2012 07:03:53 -0700 (PDT) Subject: Re: WEKA logistic regression on hadoop References: From: Abhishek Shivkumar Content-Type: text/plain; charset=us-ascii X-Mailer: iPad Mail (9B206) In-Reply-To: Message-Id: <2D46EFE7-2869-44EF-8F87-9A3FC548B0A9@gmail.com> Date: Tue, 16 Oct 2012 19:37:59 +0530 To: "user@hadoop.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org As far as I know weka cannot be run on hadoop directly. What can be done is if your algorithm first generats a model based on a trai= ning data initially, then you can run your training offline on your laptop a= nd serialize, i.e. write the trained model in a file. Now, put this model fi= le on hdfs and read it inside your setup method of map reduce programs.=20 As and when you read your input in your mapper method, you can take the trai= ned model file to determine any decision such as a classification or other s= upervised machine lerarning algorithm decisions. I did this for SVM and it did work. I am interested to know if anyone else has tried any alternate method to por= t weka algorithms on hadoop. Thanks! With Regards, Abhishek S On Oct 16, 2012, at 7:16 PM, Rajesh Nikam wrote: > Hi, >=20 > I was looking for logistic regression algorithms on hadoop. > mahout is one good package to use on hadoop, however I am not able to get c= ould results with my experiments. >=20 > There are logistic regression algorithms supported with WEKA which I have u= sed on Windows. > I guess I should be able to run these algos from JAR files as is on linux.= >=20 > java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M 6= -t lr.arff=20 >=20 > Have anyone ported them to take advantage of hadoop ? >=20 > How to interpret the output generated from it like what is Coefficients an= d Odds Ratios that could be used for classification ? >=20 >=20 > Options: -R 1.0E-8 -M 6=20 >=20 > Logistic Regression with ridge parameter of 1.0E-8 > Coefficients... > Class > Variable class_1 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > a1 0 > a2 0 > a3 0 > a4 0.0082 > a5 0.0151 > a6 -0.1034 > a7 0 > a8 0 > a9 0 > a10 -0.0397 > a11 -0.0003 > a13 -0.1195 > a14 -0.1389 > Intercept -21.487 >=20 >=20 > Odds Ratios... > Class > Variable class_1 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > a1 1 > a2 1 > a3 1 > a4 1.0083 > a5 1.0152 > a6 0.9018 > a7 1 > a8 1 > a9 1 > a10 0.961 > a11 0.9997 > a13 0.8873 > a14 0.8703 >=20 > Time taken to build model: 6.39 seconds > Time taken to test model on training data: 1.86 seconds >=20 > =3D=3D=3D Error on training data =3D=3D=3D >=20 > Correctly Classified Instances 49528 99.9173 % > Incorrectly Classified Instances 41 0.0827 % > Kappa statistic 0.9983 > Mean absolute error 0.0011 > Root mean squared error 0.0244 > Relative absolute error 0.2202 % > Root relative squared error 4.895 % > Total Number of Instances 49569 =20 >=20 >=20 > =3D=3D=3D Confusion Matrix =3D=3D=3D >=20 > a b <-- classified as > 26526 37 | a =3D class_1 > 4 23002 | b =3D class_2 >=20 >=20 >=20 > =3D=3D=3D Stratified cross-validation =3D=3D=3D >=20 > Correctly Classified Instances 49492 99.8447 % > Incorrectly Classified Instances 77 0.1553 % > Kappa statistic 0.9969 > Mean absolute error 0.0015 > Root mean squared error 0.0358 > Relative absolute error 0.3108 % > Root relative squared error 7.1718 % > Total Number of Instances 49569 =20 >=20 >=20 > =3D=3D=3D Confusion Matrix =3D=3D=3D >=20 > a b <-- classified as > 26532 31 | a =3D class_1 > 46 22960 | b =3D class_2 >=20 > Thanks in advance. > Rajesh=20