Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8312F10D96 for ; Mon, 3 Mar 2014 05:15:21 +0000 (UTC) Received: (qmail 76527 invoked by uid 500); 3 Mar 2014 05:15:13 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 75871 invoked by uid 500); 3 Mar 2014 05:15:12 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 75864 invoked by uid 99); 3 Mar 2014 05:15:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Mar 2014 05:15:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of brahmareddy.battula@huawei.com designates 119.145.14.64 as permitted sender) Received: from [119.145.14.64] (HELO szxga01-in.huawei.com) (119.145.14.64) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Mar 2014 05:15:05 +0000 Received: from 172.24.2.119 (EHLO szxeml208-edg.china.huawei.com) ([172.24.2.119]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id BSC41416; Mon, 03 Mar 2014 13:14:41 +0800 (CST) Received: from SZXEML457-HUB.china.huawei.com (10.82.67.200) by szxeml208-edg.china.huawei.com (172.24.2.57) with Microsoft SMTP Server (TLS) id 14.3.158.1; Mon, 3 Mar 2014 13:14:40 +0800 Received: from SZXEML510-MBX.china.huawei.com ([169.254.3.92]) by szxeml457-hub.china.huawei.com ([10.82.67.200]) with mapi id 14.03.0158.001; Mon, 3 Mar 2014 13:14:35 +0800 From: Brahma Reddy Battula To: "user@hadoop.apache.org" Subject: RE: Huge disk IO on only one disk Thread-Topic: Huge disk IO on only one disk Thread-Index: AQHPNneJvHRpefz8m0O9XEzyVDNKzJrOzqxP Date: Mon, 3 Mar 2014 05:14:34 +0000 Message-ID: <8AD4EE147886274A8B495D6AF407DF694B1E1959@szxeml510-mbx.china.huawei.com> References: In-Reply-To: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.18.144.148] Content-Type: multipart/alternative; boundary="_000_8AD4EE147886274A8B495D6AF407DF694B1E1959szxeml510mbxchi_" MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Virus-Checked: Checked by ClamAV on apache.org --_000_8AD4EE147886274A8B495D6AF407DF694B1E1959szxeml510mbxchi_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Seems to be you had started cluster with default values for the following t= wo properties and configured for only hadoop.tmp.dir . dfs.datanode.data.dir ---> file://${hadoop.tmp.dir}/dfs/data (Default valu= e) >>>>Determines where on the local filesystem an DFS data node should store = its blocks. If this is a comma-delimited list of directories, then data wil= l be stored in all named directories, typically on different devices yarn.nodemanager.local-dirs --> ${hadoop.tmp.dir}/nm-local-dir (Default va= lue) >>>>>>To store localized files, It's like inetermediate files Please configure above two values as muliple dir's.. Thanks & Regards Brahma Reddy Battula ________________________________ From: Siddharth Tiwari [siddharth.tiwari@live.com] Sent: Monday, March 03, 2014 5:58 AM To: USers Hadoop Subject: Huge disk IO on only one disk Hi Team, I have 10 disks over which I am running my HDFS. Out of this on disk5 I hav= e my hadoop.tmp.dir configured. I see that on this disk I have huge IO when= I run my jobs compared to other disks. Can you guide my to the standards t= o follow so that this IO can be distributed across to other disks as well. What should be the standard around setting up the hadoop.tmp.dir parameter. Any help would be highly appreciated. below is IO while I am running a huge= job. Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 2.11 37.65 226.20 313512628 1883809216 sdb 1.47 96.44 152.48 803144582 1269829840 sdc 1.45 93.03 153.10 774765734 1274979080 sdd 1.46 95.06 152.73 791690022 1271944848 sde 1.47 92.70 153.24 772025750 1276195288 sdf 1.55 95.77 153.06 797567654 1274657320 sdg 10.10 364.26 1951.79 3033537062 16254346480 sdi 1.46 94.82 152.98 789646630 1274014936 sdh 1.44 94.09 152.57 783547390 1270598232 sdj 1.44 91.94 153.37 765678470 1277220208 sdk 1.52 97.01 153.02 807928678 1274300360 *------------------------* Cheers !!! Siddharth Tiwari Have a refreshing day !!! "Every duty is holy, and devotion to duty is the highest form of worship of= God.=94 "Maybe other people will try to limit me but I don't limit myself" --_000_8AD4EE147886274A8B495D6AF407DF694B1E1959szxeml510mbxchi_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable

 

 

Seems to be you had started cluster with default values for the followin= g two properties and configured for only hadoop.tmp.dir .

 

dfs.datanode.data.dir ---&= gt;  file://${hadoop.tmp= .dir}/dfs/data (Default value)

 

>>>>Determines where on the local filesystem an&= nbsp;DFS data node should store its blocks. If this is a comma-delim= ited list of directories, then data will be stored in all named directories= , typically on different devices

 

yarn.nodemanager.local-dirs -->&= nbsp; ${hadoop.tmp.dir}/nm-local-dir (Default value)

 

>>>>>>To store localized files, It's like ineterm= ediate files

 

 

Please configure above two values as muliple dir's..<= /p>

 

 

 

Thanks & Regards 

Brahma Reddy Battula

 

From: Siddharth Tiwari [siddharth.tiwari@l= ive.com]
Sent: Monday, March 03, 2014 5:58 AM
To: USers Hadoop
Subject: Huge disk IO on only one disk

Hi Team,

I have 10 disks over which I a= m running my HDFS. Out of this on disk5 I have my hadoop.tmp.dir configured= . I see that on this disk I have huge IO when I run my jobs compared to oth= er disks. Can you guide my to the standards to follow so that this IO can be distributed across to other disks as well= . 
What should be the standard ar= ound setting up the hadoop.tmp.dir parameter. 
Any help would be highly appre= ciated. below is IO while I am running a huge job.


Device:            t= ps   Blk_read/s   Blk_wrtn/s   Blk_read =   Blk_wrtn

sda            =    2.11        37.65 &nbs= p;     226.20  313512628 1883809216

sdb            =    1.47        96.44 &nbs= p;     152.48  803144582 1269829840

sdc            =    1.45        93.03 &nbs= p;     153.10  774765734 1274979080

sdd            =    1.46        95.06 &nbs= p;     152.73  791690022 1271944848

sde            =    1.47        92.70 &nbs= p;     153.24  772025750 1276195288

sdf            =    1.55        95.77 &nbs= p;     153.06  797567654 1274657320

sdg            =   10.10       364.26   &n= bsp;  1951.79 3033537062 16254346480

sdi            =    1.46        94.82 &nbs= p;     152.98  789646630 1274014936

sdh            =    1.44        94.09 &nbs= p;     152.57  783547390 1270598232

sdj             = ;  1.44        91.94 &nbs= p;     153.37  765678470 1277220208

sdk            =    1.52        97.01 &nbs= p;     153.02  807928678 1274300360



*------------------------*<= br> Cheers != !!
Siddharth Tiwari
Have a refre= shing day !!!
"Eve= ry duty is holy, and devotion to duty is the highest form of worship of God= .=94
"Maybe other people will try to limit me but I don't limit myself<= /font>"
--_000_8AD4EE147886274A8B495D6AF407DF694B1E1959szxeml510mbxchi_--