hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unmesha sreeveni <unmeshab...@gmail.com>
Subject Parallel SVM Implementation | Taking Long time for JobCompletion
Date Tue, 12 Nov 2013 07:24:24 GMT
I am trying to implement SVM in hadoop ,the training phase..
when i am processing large files(checked with 5000 records) it is taking
about 30 min to complete the job.

how can i increase the speed.

In Hadoop - The Definitive Guide it is telling that

The logical records that FileInputFormats define do not usually fit neatly
into HDFS blocks. For example, a TextInputFormat’s logical records are
lines, which will cross HDFS boundaries more often than not. This has no
bearing on the functioning of your program—lines are not missed or broken,
for example—but it’s worth knowing about, as it does mean that data-local
maps (that is, maps that are running on the same host as their input data)
will perform some remote reads. The slight overhead this causes is not
normally significant.

I am using
               job.setInputFormatClass(TextInputFormat.class);
               job.setOutputFormatClass(TextOutputFormat.class);
in driver class. so in mapper i am getting each line of input..is that a
reason for slowing down my job.

how to increase the speed..
Any suggestion?

-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Mime
View raw message