Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2ED6510B3A for ; Tue, 12 Nov 2013 07:25:06 +0000 (UTC) Received: (qmail 28181 invoked by uid 500); 12 Nov 2013 07:24:53 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 27701 invoked by uid 500); 12 Nov 2013 07:24:52 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 27694 invoked by uid 99); 12 Nov 2013 07:24:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Nov 2013 07:24:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of unmeshabiju@gmail.com designates 209.85.212.52 as permitted sender) Received: from [209.85.212.52] (HELO mail-vb0-f52.google.com) (209.85.212.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Nov 2013 07:24:45 +0000 Received: by mail-vb0-f52.google.com with SMTP id f12so3919817vbg.25 for ; Mon, 11 Nov 2013 23:24:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=mMvWHVDDxTeyfnv2t8oV+LeBwC62wgGwjzNOCfsdKi4=; b=PK52h/UhSNVYVF5f6s4uhYQ/J5ayDRNFSPCBfr6dFYugqBxqLksWbP9PgPiKkBfc/L asyrZh6z/k2A+etAuY7U4eKqXNsVIuDDSxJh/vi5m+bUtnpoIKFGGiazN7qJjkMDZZKQ qsROq27kFmtS1J0G/26bxgxOdnRolrmZL3n2HfYEaGh/AM3hazDb94kU70qY7YH543Rh Y38XvyxVh018cJjmJcK9hB6sfKnvJO+o9PNmeEj8s74lfJ4ctKyi33Yhwc89S4QfnTzN eMt4BBROLKgl+HwyABCSDq/dfV5jACwJZUD8DH0Ky7/VDO2Db/j/cNiFM+0lkKDbW4EB BSaQ== MIME-Version: 1.0 X-Received: by 10.58.100.244 with SMTP id fb20mr27485250veb.6.1384241064124; Mon, 11 Nov 2013 23:24:24 -0800 (PST) Received: by 10.59.8.2 with HTTP; Mon, 11 Nov 2013 23:24:24 -0800 (PST) Date: Tue, 12 Nov 2013 12:54:24 +0530 Message-ID: Subject: Parallel SVM Implementation | Taking Long time for JobCompletion From: unmesha sreeveni To: User Hadoop Content-Type: multipart/alternative; boundary=089e013a2708e927bc04eaf5bca3 X-Virus-Checked: Checked by ClamAV on apache.org --089e013a2708e927bc04eaf5bca3 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I am trying to implement SVM in hadoop ,the training phase.. when i am processing large files(checked with 5000 records) it is taking about 30 min to complete the job. how can i increase the speed. In Hadoop - The Definitive Guide it is telling that The logical records that FileInputFormats define do not usually fit neatly into HDFS blocks. For example, a TextInputFormat=92s logical records are lines, which will cross HDFS boundaries more often than not. This has no bearing on the functioning of your program=97lines are not missed or broken= , for example=97but it=92s worth knowing about, as it does mean that data-loc= al maps (that is, maps that are running on the same host as their input data) will perform some remote reads. The slight overhead this causes is not normally significant. I am using job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); in driver class. so in mapper i am getting each line of input..is that a reason for slowing down my job. how to increase the speed.. Any suggestion? --=20 *Thanks & Regards* Unmesha Sreeveni U.B *Junior Developer* --089e013a2708e927bc04eaf5bca3 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
I am trying to implement SVM in hadoop ,the training phase..
wh= en i am processing large files(checked with 5000 records) it is taking abou= t 30 min to complete the job.

<= /div>
= how can i increase the speed.

In = Hadoop - The Definitive Guide it is telling that

The logical records that FileIn= putFormats define do not usually fit neatly into HDFS blocks. For example, = a TextInputFormat=92s logical records are lines, which will cross HDFS boun= daries more often than not. This has no bearing on the functioning of your = program=97lines are not missed or broken, for example=97but it=92s worth kn= owing about, as it does mean that data-local maps (that is, maps that are r= unning on the same host as their input data) will perform some remote reads= . The slight overhead this causes is not normally significant.

I am using=A0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0job.setInputForma= tClass(TextInputFormat.class);
=A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0job.setOutputFormatClass(TextOutputFormat.class);
in d= river class. so in mapper i am getting each line of input..is that a reason= for slowing down my job.

=
how t= o increase the speed..
Any suggestion?

--
Thanks &a= mp; Regards

Unmesha Sreeveni U.B
Junior Developer

--089e013a2708e927bc04eaf5bca3--