Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7944D1062C for ; Mon, 21 Oct 2013 07:10:17 +0000 (UTC) Received: (qmail 95884 invoked by uid 500); 21 Oct 2013 07:10:11 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 95338 invoked by uid 500); 21 Oct 2013 07:10:04 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 95330 invoked by uid 99); 21 Oct 2013 07:10:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Oct 2013 07:10:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of drdwitte@gmail.com designates 209.85.217.170 as permitted sender) Received: from [209.85.217.170] (HELO mail-lb0-f170.google.com) (209.85.217.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Oct 2013 07:09:57 +0000 Received: by mail-lb0-f170.google.com with SMTP id u14so2178465lbd.1 for ; Mon, 21 Oct 2013 00:09:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=zXHQYuTMsXqLlBGkihdr1AMWE+zqNFZbvMbDhywV1OU=; b=G8YzOh0zjOpglx4aD1UEaFzfUsAd5JmnNFTiQZVtHPIgMgwnLqwlR0P1UrYxZxK1P7 H+xFiEF1nk6OlaFjZCvZuT+Wj0aHCD6kzcLtt+BNASkOkQT0b3EK3VDYSkng2NF+wZre vPLJ5DjgxQYPiy5abzx21MM0T36St7YN2ZVoF1uGJovPgOiN4x+7+0GOrlej7U/VwU9b a8kAd3/Fb/3UHVCpDg7NMY7WSNyWhXwLHQZnTrJFAJGMCsnjkcYvpYzPgbBu5zPLmtZP fnh2zZIxLiBI0bp5lreOsmjh3KG4n4nriICZK7wa4HTNDX4mLn6xpoo6JbXxGNFjCGjf X/1A== MIME-Version: 1.0 X-Received: by 10.112.168.3 with SMTP id zs3mr12424153lbb.2.1382339376503; Mon, 21 Oct 2013 00:09:36 -0700 (PDT) Received: by 10.112.200.163 with HTTP; Mon, 21 Oct 2013 00:09:36 -0700 (PDT) In-Reply-To: References: Date: Mon, 21 Oct 2013 09:09:36 +0200 Message-ID: Subject: Re: number of map and reduce task does not change in M/R program From: Dieter De Witte To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c33fe67ee32204e93af7b4 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c33fe67ee32204e93af7b4 Content-Type: text/plain; charset=ISO-8859-1 Anseh, Let's assume that your job is fully scalable, then it should take: 100 000 000 / 600 000 times the amount of time of the first job, which is 1000 / 6 = 167 times longer. This is an ideal, probably it will be something like 200 times. Also try using units in your questions + scientific notation 10^8 records or 10^8 bytes? Regards, irW 2013/10/20 Anseh Danesh > OK... thanks a lot for the link... it is so useful... ;) > > > On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin wrote: > >> Try profiling the job ( >> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling) >> And yeah the machine specs could be the reason, that's why hadoop was >> invented in the first place ;) >> >> >> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh wrote: >> >>> I try it in a small set of data, in about 600000 data and it does not >>> take too long. the execution time was reasonable. but in the set of >>> 100000000 data it really works too bad. any thing else, I have 2 processors >>> in my machine, I think this amount of data is very huge for my processor >>> and this way it takes too long to process... what do you think about this? >>> >>> >>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin wrote: >>> >>>> Try running the job locally on a small set of the data and see if it >>>> takes too long. If so, you map code might have some performance issues >>>> >>>> >>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh wrote: >>>> >>>>> Hi all.. I have a question.. I have a mapreduce program that get input >>>>> from cassandra. my input is a little big, about 100000000 data. my problem >>>>> is that my program takes too long to process, but I think mapreduce is good >>>>> and fast for large volume of data. so I think maybe I have problems in >>>>> number of map and reduce tasks.. I set the number of map and reduce asks >>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see >>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about >>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help >>>>> me I really get confused... >>>>> >>>> >>>> >>> >> > --001a11c33fe67ee32204e93af7b4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Anseh,

Let's assume that your j= ob is fully scalable, then it should take: 100 000 000 / 600 000 times the = amount of time of the first job, which is 1000 / 6 =3D 167 times longer. Th= is is an ideal, probably it will be something like 200 times. Also try usin= g units in your questions + scientific notation 10^8 records or 10^8 bytes?=

Regards, irW=A0


2013/10/20 Anseh Danesh <anseh.danesh@gmail.co= m>
OK... thanks a lot for the = link... it is so useful... ;)


On Sun, Oct 20, 2013 at 6:59 PM, Amr Sha= hin <amrnablus@gmail.com> wrote:
Try profiling the job (http://hadoop.apache.org/docs/stable/mapred_tutorial.htm= l#Profiling)
And yeah the machine specs could be the reason, that's why hadoop was i= nvented in the first place ;)


On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <anseh.danesh@gmail.= com> wrote:
I try it in a small set of = data, in about 600000 data and it does not take too long. the execution tim= e was reasonable. but in the set of 100000000 data it really works too bad.= any thing else, I have 2 processors in my machine, I think this amount of = data is very huge for my processor and this way it takes too long to proces= s... what do you think about this?



--001a11c33fe67ee32204e93af7b4--