Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of drdwitte@gmail.com designates
 209.85.217.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CANuJdeWjyRBQ2Z+gyJkSzbVxcqYoeE2Ka1nZTFuLGwDHxfVQew@mail.gmail.com>
References: 
 <CANuJdeUFJCfsdZA2T5bb0ChZGNkoKdV6_SrpehNB0tGGM0YJmg@mail.gmail.com>
	<CAM14sAhBY4Sei=T2dqf9K=H7Oqe1Okg4aCc7MvZHnZKaFT4FOg@mail.gmail.com>
	<CANuJdeVDeeMFWDdNBxx6KYqyv9Qs31VGOyceYS_ta6eQ5J4Zgw@mail.gmail.com>
	<CAM14sAjkR3PG-kxeu7pPJc0QEh7roEXhqkrGjibdVhMTS0nJ_A@mail.gmail.com>
	<CANuJdeWjyRBQ2Z+gyJkSzbVxcqYoeE2Ka1nZTFuLGwDHxfVQew@mail.gmail.com>
Date: Mon, 21 Oct 2013 09:09:36 +0200
Message-ID: 
 <CALSJUsTCmFYWgjKgCGyMFKLtxWR28D_dTLptLW_vidPK2tvjXg@mail.gmail.com>
Subject: Re: number of map and reduce task does not change in M/R program
From: Dieter De Witte <drdwitte@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c33fe67ee32204e93af7b4

--001a11c33fe67ee32204e93af7b4
Content-Type: text/plain; charset=ISO-8859-1

Anseh,

Let's assume that your job is fully scalable, then it should take: 100 000
000 / 600 000 times the amount of time of the first job, which is 1000 / 6
= 167 times longer. This is an ideal, probably it will be something like
200 times. Also try using units in your questions + scientific notation
10^8 records or 10^8 bytes?

Regards, irW


2013/10/20 Anseh Danesh <anseh.danesh@gmail.com>

> OK... thanks a lot for the link... it is so useful... ;)
>
>
> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin <amrnablus@gmail.com> wrote:
>
>> Try profiling the job (
>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>> And yeah the machine specs could be the reason, that's why hadoop was
>> invented in the first place ;)
>>
>>
>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <anseh.danesh@gmail.com>wrote:
>>
>>> I try it in a small set of data, in about 600000 data and it does not
>>> take too long. the execution time was reasonable. but in the set of
>>> 100000000 data it really works too bad. any thing else, I have 2 processors
>>> in my machine, I think this amount of data is very huge for my processor
>>> and this way it takes too long to process... what do you think about this?
>>>
>>>
>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <amrnablus@gmail.com> wrote:
>>>
>>>> Try running the job locally on a small set of the data and see if it
>>>> takes too long. If so, you map code might have some performance issues
>>>>
>>>>
>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <anseh.danesh@gmail.com>wrote:
>>>>
>>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>>> from cassandra. my input is a little big, about 100000000 data. my problem
>>>>> is that my program takes too long to process, but I think mapreduce is good
>>>>> and fast for large volume of data. so I think maybe I have problems in
>>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>>> me I really get confused...
>>>>>
>>>>
>>>>
>>>
>>
>

--001a11c33fe67ee32204e93af7b4
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>Anseh,<br><br></div>Let&#39;s assume that your j=
ob is fully scalable, then it should take: 100 000 000 / 600 000 times the =
amount of time of the first job, which is 1000 / 6 =3D 167 times longer. Th=
is is an ideal, probably it will be something like 200 times. Also try usin=
g units in your questions + scientific notation 10^8 records or 10^8 bytes?=
<br>
<br></div>Regards, irW=A0 <br></div><div class=3D"gmail_extra"><br><br><div=
 class=3D"gmail_quote">2013/10/20 Anseh Danesh <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:anseh.danesh@gmail.com" target=3D"_blank">anseh.danesh@gmail.co=
m</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">OK... thanks a lot for the =
link... it is so useful... ;)<br></div><div class=3D"HOEnZb"><div class=3D"=
h5"><div class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">On Sun, Oct 20, 2013 at 6:59 PM, Amr Sha=
hin <span dir=3D"ltr">&lt;<a href=3D"mailto:amrnablus@gmail.com" target=3D"=
_blank">amrnablus@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Try profiling the job (<a h=
ref=3D"http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling"=
 target=3D"_blank">http://hadoop.apache.org/docs/stable/mapred_tutorial.htm=
l#Profiling</a>)<div>

And yeah the machine specs could be the reason, that&#39;s why hadoop was i=
nvented in the first place ;)</div>

</div><div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quot=
e">On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:anseh.danesh@gmail.com" target=3D"_blank">anseh.danesh@gmail.=
com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I try it in a small set of =
data, in about 600000 data and it does not take too long. the execution tim=
e was reasonable. but in the set of 100000000 data it really works too bad.=
 any thing else, I have 2 processors in my machine, I think this amount of =
data is very huge for my processor and this way it takes too long to proces=
s... what do you think about this?<br>


</div><div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quot=
e">On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:amrnablus@gmail.com" target=3D"_blank">amrnablus@gmail.com</a>&=
gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Try running the job locally=
 on a small set of the data and see if it takes too long. If so, you map co=
de might have some performance issues</div>


<div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On =
Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh <span dir=3D"ltr">&lt;<a href=3D=
"mailto:anseh.danesh@gmail.com" target=3D"_blank">anseh.danesh@gmail.com</a=
>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi all.. I have a question.=
. I have a mapreduce program that get input from=20
cassandra. my input is a little big, about 100000000 data. my problem is
 that my program takes too long to process, but I think mapreduce is=20
good and fast for large volume of data. so I think maybe I have problems
 in number of map and reduce tasks.. I set the number of map and reduce=20
asks with JobConf, with Job, and also in conf/mapred-site.xml, but I=20
don&#39;t see any changes.. in my logs at first there is map 0% reduce 0%=
=20
and after about 2 hours working it shows map 1% reduce 0%..!!
what should I do? please Help me I really get confused... </div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a11c33fe67ee32204e93af7b4--