Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of unmeshabiju@gmail.com
 designates 209.85.212.52 as permitted sender)
MIME-Version: 1.0
Date: Tue, 12 Nov 2013 12:54:24 +0530
Message-ID: 
 <CACp0qUGev1DLuf0_4GhRcLG5rOpO018XeBtgPrELqXq_r8jZDQ@mail.gmail.com>
Subject: Parallel SVM Implementation | Taking Long time for JobCompletion
From: unmesha sreeveni <unmeshabiju@gmail.com>
To: User Hadoop <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=089e013a2708e927bc04eaf5bca3

--089e013a2708e927bc04eaf5bca3
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I am trying to implement SVM in hadoop ,the training phase..
when i am processing large files(checked with 5000 records) it is taking
about 30 min to complete the job.

how can i increase the speed.

In Hadoop - The Definitive Guide it is telling that

The logical records that FileInputFormats define do not usually fit neatly
into HDFS blocks. For example, a TextInputFormat=92s logical records are
lines, which will cross HDFS boundaries more often than not. This has no
bearing on the functioning of your program=97lines are not missed or broken=
,
for example=97but it=92s worth knowing about, as it does mean that data-loc=
al
maps (that is, maps that are running on the same host as their input data)
will perform some remote reads. The slight overhead this causes is not
normally significant.

I am using
               job.setInputFormatClass(TextInputFormat.class);
               job.setOutputFormatClass(TextOutputFormat.class);
in driver class. so in mapper i am getting each line of input..is that a
reason for slowing down my job.

how to increase the speed..
Any suggestion?

--=20
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

--089e013a2708e927bc04eaf5bca3
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:verdana,=
sans-serif">I am trying to implement SVM in hadoop ,the training phase..</d=
iv><div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif">wh=
en i am processing large files(checked with 5000 records) it is taking abou=
t 30 min to complete the job.</div>
<div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif"><br><=
/div><div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif">=
how can i increase the speed.</div><div class=3D"gmail_default" style=3D"fo=
nt-family:verdana,sans-serif">
<br></div><div class=3D"gmail_default" style=3D"font-family:verdana,sans-se=
rif"><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:14px;v=
ertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,&#39;L=
iberation Sans&#39;,&#39;DejaVu Sans&#39;,sans-serif;line-height:18px">
<code style=3D"margin:0px;padding:1px 5px;border:0px;vertical-align:baselin=
e;background-color:rgb(238,238,238);font-family:Consolas,Menlo,Monaco,&#39;=
Lucida Console&#39;,&#39;Liberation Mono&#39;,&#39;DejaVu Sans Mono&#39;,&#=
39;Bitstream Vera Sans Mono&#39;,&#39;Courier New&#39;,monospace,serif">In =
Hadoop - The Definitive Guide it is telling that</code></p>
<blockquote style=3D"margin:0px 0px 10px;padding:10px 10px 1px;border:0px;f=
ont-size:14px;vertical-align:baseline;background-color:rgb(238,238,238);quo=
tes:none;color:rgb(0,0,0);font-family:Arial,&#39;Liberation Sans&#39;,&#39;=
DejaVu Sans&#39;,sans-serif;line-height:18px">
<p style=3D"margin:0px 0px 1em;padding:0px;border:0px;vertical-align:baseli=
ne;background-color:transparent;clear:both">The logical records that FileIn=
putFormats define do not usually fit neatly into HDFS blocks. For example, =
a TextInputFormat=92s logical records are lines, which will cross HDFS boun=
daries more often than not. This has no bearing on the functioning of your =
program=97lines are not missed or broken, for example=97but it=92s worth kn=
owing about, as it does mean that data-local maps (that is, maps that are r=
unning on the same host as their input data) will perform some remote reads=
. The slight overhead this causes is not normally significant.</p>
</blockquote></div><div><div class=3D"gmail_default" style=3D"font-family:v=
erdana,sans-serif">I am using=A0</div><div class=3D"gmail_default"><font fa=
ce=3D"verdana, sans-serif">=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0job.setInputForma=
tClass(TextInputFormat.class);</font></div>
<div class=3D"gmail_default"><font face=3D"verdana, sans-serif">=A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0job.setOutputFormatClass(TextOutputFormat.class);</font=
></div><div class=3D"gmail_default"><font face=3D"verdana, sans-serif">in d=
river class. so in mapper i am getting each line of input..is that a reason=
 for slowing down my job.</font></div>
<div class=3D"gmail_default"><font face=3D"verdana, sans-serif"><br></font>=
</div><div class=3D"gmail_default"><font face=3D"verdana, sans-serif">how t=
o increase the speed..</font></div><div class=3D"gmail_default"><font face=
=3D"verdana, sans-serif">Any suggestion?</font></div>
<br></div>-- <br><div dir=3D"ltr"><i style=3D"color:rgb(102,0,0)">Thanks &a=
mp; Regards</i>
<div style=3D"color:rgb(102,0,0)"><i><br></i></div><div style=3D"color:rgb(=
102,0,0)">Unmesha Sreeveni U.B<i><br></i></div><div style=3D"color:rgb(102,=
0,0)"><i>Junior Developer<br></i></div><div style=3D"color:rgb(102,0,0)"><b=
r></div>
<i><span><br></span></i></div>
</div>

--089e013a2708e927bc04eaf5bca3--