Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 36A71D85A for ; Tue, 8 Jan 2013 21:46:10 +0000 (UTC) Received: (qmail 37325 invoked by uid 500); 8 Jan 2013 21:46:05 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 37074 invoked by uid 500); 8 Jan 2013 21:46:05 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 37059 invoked by uid 99); 8 Jan 2013 21:46:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2013 21:46:05 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=DEAR_SOMETHING,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adi@cloudera.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2013 21:45:59 +0000 Received: by mail-ie0-f180.google.com with SMTP id c10so1164692ieb.39 for ; Tue, 08 Jan 2013 13:45:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding:x-gm-message-state; bh=U5rJNOVZ18AVOXORxJQt3USFxPpr9P1oeCYtZGYxw+w=; b=gZ/sBaic6b+CU1ytfaHq2EOLfKMP7fJgFWE8jRZ0togk3daLaEKTJue01mLOoyHTKl UWWp8FFGL46ZwFxncd/gtXXIYiIrJ+ryETo3ubHH2sQzgEDLVkaST7E/VISQjtb2r+BT uibwGpd3inLDSeZgGbA8Kd7S5VDVyIx9x5p29ieqIV/zK2AwLV2bKzOdbATmaT6rb2Ip C+/fXx+8FHIIozBRne2dqB3ZY0DbFC4MYDYUphxtJAGuA0zbuvLG71OD5nZKjtCZUk89 hzn0XN9m6q7HBu5q+h7CcIVVn4KBd4spv5gPiWRzide60ohSRKB00mAV1FbIh1JqV/5t Nqdg== MIME-Version: 1.0 Received: by 10.50.190.234 with SMTP id gt10mr10708037igc.73.1357681539281; Tue, 08 Jan 2013 13:45:39 -0800 (PST) Received: by 10.64.13.207 with HTTP; Tue, 8 Jan 2013 13:45:39 -0800 (PST) In-Reply-To: <3E657120E422654A9EB626F537B8AA910FDB5622@SHSMSX102.ccr.corp.intel.com> References: <3E657120E422654A9EB626F537B8AA910FDB5622@SHSMSX102.ccr.corp.intel.com> Date: Tue, 8 Jan 2013 13:45:39 -0800 Message-ID: Subject: Re: Some mappers are much slower than others in reading data from HDFS From: Andy Isaacson To: user@hadoop.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQmI2rTv/kNMzXffSgpAcn18TWnqx1/vPvvbuU3vONDF1lp52oMQxixWlfUvK5tyMSiX2ac+ X-Virus-Checked: Checked by ClamAV on apache.org Your output shows that node2 has 13 mappers and the reducer, while node3 and node4 had only 8 mappers each. So I'd expect some disparity. Since it's hard to correlate the mapper throughput against the reducer throughput, it's possible that node3 got just as much work done. That doesn't explain why node4 is slower than node3, though. -andy On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng wrot= e: > Dear sir, > > I encountered a strange problem that all the mappers on some nodes are mu= ch > slower than the mappers on other nodes as following some times (not alway= s). > I didn=92t see any reasons why they should slow down in this pattern. > > > > 000013(MAP on node4): --------(8.115) > > 000014(MAP on node4): --------(8.570) > > 000011(MAP on node4): --------(8.5) > > 000016(MAP on node4): --------(8.344) > > 000010(MAP on node4): --------(8.585) > > 000015(MAP on node4): --------(8.179) > > 000017(MAP on node4): --------(8.445) > > 000012(MAP on node4): --------(8.312) > > 000018(MAP on node2): ---(3.367) > > 000020(MAP on node2): ---(3.335) > > 000019(MAP on node2): ---(3.320) > > 000023(MAP on node2): ---(3.91) > > 000022(MAP on node2): ---(3.371) > > 000021(MAP on node2): ---(3.458) > > 000004(MAP on node3): -------------------(19.624) > > 000007(MAP on node3): -------------------(19.92) > > 000005(MAP on node3): --------------------(20.613) > > 000008(MAP on node3): --------------------(20.316) > > 000003(MAP on node3): --------------------(20.574) > > 000006(MAP on node3): --------------------(20.654) > > 000002(MAP on node3): -------------------(19.935) > > 000009(MAP on node3): --------------------(20.489) > > 000025(MAP on node2): --(2.877) > > 000026(MAP on node2): ---(3.112) > > 000027(MAP on node2): --(2.959) > > 000024(MAP on node2): --(2.845) > > 000029(MAP on node2): --(2.863) > > 000028(MAP on node2): --(2.933) > > 000031(MAP on node2): --(2.596) > > 000030(RED on node2): -------------(13.378) > > > > The testing is as following: > > I have a 4 nodes cluster and all of them has the same hardware and softwa= re > configurations. One node acts as name node and yarn resource manager. Ot= her > three nodes act as both data node and yarn node manager. > > > > The test input file is around 7GB file on the HDFS cluster and the > replication number is 3. (This means that each data node has a copy of ev= ery > block of the file) > > > > The mapper did nothing and didn=92t write out any records: > > > > public static class KeyMapper > > extends Mapper{ > > public void map(Object key, Text value, Context context > > ) throws IOException, InterruptedException { > > > > } > > } > > > > So this mapper logically is reading and iterating through its splits of d= ata > and then finish the job. > > I didn=92t see any factors in the above configurations that will cause th= e > above phenomenon. > > I turned on the debug log for each mapper task and it also showed that al= l > the mapper=92s DFSClient read data from its local data node. > > > > Can any experts help give some hints for this? I attached the log and cli= ent > code for analysis. > > > > Thanks, > > Haifeng