Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of robstewart57@googlemail.com
 designates 74.125.82.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlemail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=KKpqoFjqJualhJIGxQAFwQv/ZGAxrdR0PB6gp3lv0l8HUaq70JtQmWgnuECCCSwRWP
         XYLSNVr6iqPwWHqCg4E4+zzJCN51+nT/Y2gK5DMauXBahILvWu+tHhs9019GFTfQd2Qk
         vUNnVpiCH1mvkbxhSEsHPs85zIZNL1Crev68E=
MIME-Version: 1.0
In-Reply-To: <AANLkTi=hOPWjCWoLcrYL8TvwTjSUQ_hAbi7inq3kg5wj@mail.gmail.com>
References: <AANLkTikB7eES_kgjJcTCByn+_=sVM6JWufye-k4wev+B@mail.gmail.com>
 <AANLkTi=KgMxdzxuuKM9VeAAHyZn=91+PwX3REso7mo7A@mail.gmail.com>
 <AANLkTi=RNBKf85pq--7obHzZRzLgZH_kA=yxZ1e0oNr8@mail.gmail.com>
 <AANLkTi=iZ4JXe2RDx8ZQcEi9xZafeG8nX-mVZQeLRtjh@mail.gmail.com>
 <AANLkTimdVh-aNPpkZ0P6xn59aWL5_ABxKaVwFmixSyTu@mail.gmail.com>
 <AANLkTi=hOPWjCWoLcrYL8TvwTjSUQ_hAbi7inq3kg5wj@mail.gmail.com>
From: Rob Stewart <robstewart57@googlemail.com>
Date: Sat, 11 Dec 2010 14:11:29 +0000
Message-ID: <AANLkTi=J1gHzbc0w-=pEseTrxOehT+X6aUc-_q_26RxL@mail.gmail.com>
Subject: Re: Slow final few reducers
To: common-user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Sorry my fault - It's someone running a network simulator on the cluster !

Rob

On 11 December 2010 14:09, Rob Stewart <robstewart57@googlemail.com> wrote:
> OK, slight update:
>
> Immediately underneath public void reduce(), I have added a:
> System.out.println("Key: " + key.toString());
>
> And I am logged on a node that is still working on a reducer. However,
> it stopped printing "Key:" long ago, so it is not processing new keys.
>
> But looking more closely at "top" on this node, there are *two* linux
> processes going at 100% CPU. The first is java, which, using "jps -l"
> I realize is "Child", but the second is a process called "setdest",
> which I strongly suspect has to do with my Hadoop job.
>
> What is "setdest", and what is it actually doing? And why is it taking so long?
>
> cheers,
>
> Rob Stewart
>
>
>
> On 11 December 2010 12:26, Harsh J <qwertymaniac@gmail.com> wrote:
>> On Sat, Dec 11, 2010 at 5:25 PM, Rob Stewart
>> <robstewart57@googlemail.com> wrote:
>>> Oh,
>>>
>>> I should add, of the Java processes running on the remaining nodes for
>>> the final wave of reducers, the one taking all the CPU is the "Child"
>>> process (not TaskTracker). I log into the Master, and also, the Java
>>> process taking all the CPU is "Child".
>>>
>>> Is this normal?
>>
>> Yes, "Child" is the Task JVM.
>>
>>>
>>> thanks,
>>> Rob
>>>
>>> On 11 December 2010 11:38, Rob Stewart <robstewart57@googlemail.com> wrote:
>>>> Hi, many thanks for your response.
>>>>
>>>> A few observations:
>>>> - I know that for a fact my key distribution is quite radically skewed
>>>> (some keys with *many* value, most keys with few).
>>>> - I have overlooked the fact that I need a partitioner. I suspect that
>>>> this will help dramatically.
>>>>
>>>> I realize that the number of partitions should equal the number of
>>>> reducers (e.g. 100).
>>>>
>>>> So if here are my <key>,<values> (where values is a count):
>>>> <the>,<500>
>>>> <a>,<1000>
>>>> <the cat>,<20>
>>>> <the cat sat on the mat>,<1>
>>>>
>>>> and I have 3 reducers, how do I make:
>>>> Reducer-1: <the>
>>>> Reducer-2: <a>
>>>> Reducer-3: <the cat> & <the cat sat on the mat>
>>>>
>>>>
>>>> thanks,
>>>>
>>>> Rob
>>>>
>>>> On 11 December 2010 11:12, Harsh J <qwertymaniac@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Certain reducers may receive a higher share of data than others
>>>>> (Depending on your data/key distribution, the partition function,
>>>>> etc.). Compare the longer reduce tasks' counters with the quicker
>>>>> ones.
>>>>>
>>>>> Are you sure that the reducers that take long are definitely the last
>>>>> wave, as in with IDs of 180-200 (and not a random bunch of reduce
>>>>> tasks taking longer)?
>>>>>
>>>>> Also take a look at the logs, and the machines that run these
>>>>> particular reducers -- ensure nothing is wrong on them.
>>>>>
>>>>> There's nothing specifically written in Hadoop for the "last wave" of
>>>>> Reduce tasks to take longer. Each reducer writes to its own file, and
>>>>> is completely independent.
>>>>>
>>>>> --
>>>>> Harsh J
>>>>> www.harshj.com
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>>
>