Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 97151 invoked from network); 11 Dec 2010 14:12:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Dec 2010 14:12:21 -0000 Received: (qmail 94439 invoked by uid 500); 11 Dec 2010 14:12:18 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 94280 invoked by uid 500); 11 Dec 2010 14:12:18 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 94272 invoked by uid 99); 11 Dec 2010 14:12:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Dec 2010 14:12:17 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of robstewart57@googlemail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Dec 2010 14:12:12 +0000 Received: by wwd20 with SMTP id 20so4418897wwd.29 for ; Sat, 11 Dec 2010 06:11:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=gDoCBHG2oE3NCAjOKkc/EbDUY+2rzWKvOqzvyEVLw3o=; b=ieGn0vQVnFymiDGav5SU3AItWiRvAhuokJv8f614uS/H2JWnMuWSLPE/MLaBVECEos 7/7RTvcnnP9fx5RYErlcu9E/bpmZ0A0sF8p6t/yHJvyaDmVBewk4zL8R4AKon90Qb2nR xbEe+oyUKkF1meQB0lhRhHWzL24qXSYrk6v1A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=KKpqoFjqJualhJIGxQAFwQv/ZGAxrdR0PB6gp3lv0l8HUaq70JtQmWgnuECCCSwRWP XYLSNVr6iqPwWHqCg4E4+zzJCN51+nT/Y2gK5DMauXBahILvWu+tHhs9019GFTfQd2Qk vUNnVpiCH1mvkbxhSEsHPs85zIZNL1Crev68E= Received: by 10.216.199.81 with SMTP id w59mr706626wen.100.1292076710045; Sat, 11 Dec 2010 06:11:50 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.239.67 with HTTP; Sat, 11 Dec 2010 06:11:29 -0800 (PST) In-Reply-To: References: From: Rob Stewart Date: Sat, 11 Dec 2010 14:11:29 +0000 Message-ID: Subject: Re: Slow final few reducers To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Sorry my fault - It's someone running a network simulator on the cluster ! Rob On 11 December 2010 14:09, Rob Stewart wrote: > OK, slight update: > > Immediately underneath public void reduce(), I have added a: > System.out.println("Key: " + key.toString()); > > And I am logged on a node that is still working on a reducer. However, > it stopped printing "Key:" long ago, so it is not processing new keys. > > But looking more closely at "top" on this node, there are *two* linux > processes going at 100% CPU. The first is java, which, using "jps -l" > I realize is "Child", but the second is a process called "setdest", > which I strongly suspect has to do with my Hadoop job. > > What is "setdest", and what is it actually doing? And why is it taking so long? > > cheers, > > Rob Stewart > > > > On 11 December 2010 12:26, Harsh J wrote: >> On Sat, Dec 11, 2010 at 5:25 PM, Rob Stewart >> wrote: >>> Oh, >>> >>> I should add, of the Java processes running on the remaining nodes for >>> the final wave of reducers, the one taking all the CPU is the "Child" >>> process (not TaskTracker). I log into the Master, and also, the Java >>> process taking all the CPU is "Child". >>> >>> Is this normal? >> >> Yes, "Child" is the Task JVM. >> >>> >>> thanks, >>> Rob >>> >>> On 11 December 2010 11:38, Rob Stewart wrote: >>>> Hi, many thanks for your response. >>>> >>>> A few observations: >>>> - I know that for a fact my key distribution is quite radically skewed >>>> (some keys with *many* value, most keys with few). >>>> - I have overlooked the fact that I need a partitioner. I suspect that >>>> this will help dramatically. >>>> >>>> I realize that the number of partitions should equal the number of >>>> reducers (e.g. 100). >>>> >>>> So if here are my , (where values is a count): >>>> ,<500> >>>> ,<1000> >>>> ,<20> >>>> ,<1> >>>> >>>> and I have 3 reducers, how do I make: >>>> Reducer-1: >>>> Reducer-2: >>>> Reducer-3: & >>>> >>>> >>>> thanks, >>>> >>>> Rob >>>> >>>> On 11 December 2010 11:12, Harsh J wrote: >>>>> Hi, >>>>> >>>>> Certain reducers may receive a higher share of data than others >>>>> (Depending on your data/key distribution, the partition function, >>>>> etc.). Compare the longer reduce tasks' counters with the quicker >>>>> ones. >>>>> >>>>> Are you sure that the reducers that take long are definitely the last >>>>> wave, as in with IDs of 180-200 (and not a random bunch of reduce >>>>> tasks taking longer)? >>>>> >>>>> Also take a look at the logs, and the machines that run these >>>>> particular reducers -- ensure nothing is wrong on them. >>>>> >>>>> There's nothing specifically written in Hadoop for the "last wave" of >>>>> Reduce tasks to take longer. Each reducer writes to its own file, and >>>>> is completely independent. >>>>> >>>>> -- >>>>> Harsh J >>>>> www.harshj.com >>>>> >>>> >>> >> >> >> >> -- >> Harsh J >> www.harshj.com >> >