Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44B0711BE4 for ; Mon, 14 Apr 2014 21:28:56 +0000 (UTC) Received: (qmail 91690 invoked by uid 500); 14 Apr 2014 21:28:24 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 91463 invoked by uid 500); 14 Apr 2014 21:28:20 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 91415 invoked by uid 99); 14 Apr 2014 21:28:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 21:28:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ashettia@hortonworks.com designates 209.85.220.51 as permitted sender) Received: from [209.85.220.51] (HELO mail-pa0-f51.google.com) (209.85.220.51) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 21:28:13 +0000 Received: by mail-pa0-f51.google.com with SMTP id kq14so8720490pab.38 for ; Mon, 14 Apr 2014 14:27:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to:content-type; bh=ATjsG0NTT1TI/AHHWhWpHrZjISVFV870ytrEFtHyeu0=; b=f2eJxb6iR4tseoEk1LDdxdEfT7qS+v2fuvXsTJWt/xZ9EpaXlJ16QKbjJWw9mJDO4G vuuw8JqNnz5M9IrM1GsazCR491W7wN1yAFf9DHCJJO0A3+KDFKLLPb9PZxBcQWYEhoPv KYnNioq2S7boXoxFDLIXWdrGP97ZxNdYF3XpPB09w5O0kCg7o+X0CCSWYB9elHR4UAHj Rrn/Ke6LCjOclYgr4KWIy+ihvwWinCV6E/13NmC3mXYBS7cF6uMr0w0ubqW3MW31jAHN HLZwaUrh5TvIwGGAlyNcwvtoQAr6BQOPW2MeEBQeD3uh2hEYnR406A7PAOi5czhsIq1b j3Pg== X-Gm-Message-State: ALoCoQki0EEM2pMQH5XQIBcLgc5DCSYvHmD+SSh/j0rC791fvH2rkGgGJkD3x70+2nN9OQAAljthovDpk/PMEP/ImEal46TWuYkZgfv+YtlwLmOfL5Q0J1M= X-Received: by 10.68.231.196 with SMTP id ti4mr47080902pbc.48.1397510870881; Mon, 14 Apr 2014 14:27:50 -0700 (PDT) Received: from [10.11.2.137] ([192.175.27.2]) by mx.google.com with ESMTPSA id ei4sm35806346pbb.42.2014.04.14.14.27.49 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 14 Apr 2014 14:27:50 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: HDFS file system size issue From: Abdelrahman Shettia In-Reply-To: <24C9C455-FE51-4DA0-B2EA-E4269BC8A024@gmail.com> Date: Mon, 14 Apr 2014 14:27:50 -0700 Cc: Hive user group Message-Id: References: <0764C737-6126-4308-9F4C-9F8F6D853A4E@gmail.com> <24C9C455-FE51-4DA0-B2EA-E4269BC8A024@gmail.com> To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1874) Content-Type: multipart/alternative; boundary="Apple-Mail=_C8A24CFF-7280-42BE-B38F-9A4392267D35" X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_C8A24CFF-7280-42BE-B38F-9A4392267D35 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Hi Biswa,=20 Are you sure that the replication factor of the files are three? Please run= a =91hadoop fsck / -blocks -files -locations=92 and see the replication fa= ctor for each file. Also, Post the configuration of dfs.datanode.du.= reserved and please check the real space presented by a DataNode by = running =91du -h=92 Thanks, Rahman On Apr 14, 2014, at 2:07 PM, Saumitra wrote: > Hello, >=20 > Biswanath, looks like we have confusion in calculation, 1TB would be equa= l to 1024GB, not 114GB. >=20 >=20 > Sandeep, I checked log directory size as well. Log directories are hardly= in few GBs, I have configured log4j properties so that logs won=92t be too= large. >=20 > In our slave machines, we have 450GB disk partition for hadoop logs and D= FS. Over there logs directory is < 10GBs and rest space is occupied by DFS.= 10GB partition is for /. >=20 > Let me quote my confusion point once again: >=20 >> Basically I wanted to point out discrepancy in name node status page and= hadoop dfs -dus. In my case, earlier one reports DFS usage as 1TB and late= r one reports it to be 35GB. What are the factors that can cause this diffe= rence? And why is just 35GB data causing DFS to hit its limits? >=20 >=20 >=20 > I am talking about name node status page on 50070 port. Here is the scree= nshot of my name node status page >=20 > >=20 > As I understand, 'DFS used=92 is the space taken by DFS, non-DFS used is = spaces taken by non-DFS data like logs or other local files from users. Nam= enode shows that DFS used is ~1TB but hadoop dfs -dus shows it to be ~38GB. >=20 >=20 >=20 > On 14-Apr-2014, at 12:33 pm, Sandeep Nemuri wrote: >=20 >> Please check your logs directory usage. >>=20 >>=20 >>=20 >> On Mon, Apr 14, 2014 at 12:08 PM, Biswajit Nayak wrote: >> Whats the replication factor you have? I believe it should be 3. hadoop = dus shows that disk usage without replication. While name node ui page give= s with replication.=20 >>=20 >> 38gb * 3 =3D114gb ~ 1TB >>=20 >> ~Biswa >> -----oThe important thing is not to stop questioning o----- >>=20 >>=20 >> On Mon, Apr 14, 2014 at 9:38 AM, Saumitra = wrote: >> Hi Biswajeet, >>=20 >> Non-dfs usage is ~100GB over the cluster. But still the number are nowhe= re near 1TB.=20 >>=20 >> Basically I wanted to point out discrepancy in name node status page and= hadoop dfs -dus. In my case, earlier one reports DFS usage as 1TB and late= r one reports it to be 35GB. What are the factors that can cause this diffe= rence? And why is just 35GB data causing DFS to hit its limits? >>=20 >>=20 >>=20 >>=20 >> On 14-Apr-2014, at 8:31 am, Biswajit Nayak w= rote: >>=20 >>> Hi Saumitra, >>>=20 >>> Could you please check the non-dfs usage. They also contribute to filli= ng up the disk space.=20 >>>=20 >>>=20 >>>=20 >>> ~Biswa >>> -----oThe important thing is not to stop questioning o----- >>>=20 >>>=20 >>> On Mon, Apr 14, 2014 at 1:24 AM, Saumitra = wrote: >>> Hello, >>>=20 >>> We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. = We are using default HDFS block size. >>>=20 >>> We have noticed that disks of slaves are almost full. From name node=92= s status page (namenode:50070), we could see that disks of live nodes are 9= 0% full and DFS Used% in cluster summary page is ~1TB. >>>=20 >>> However hadoop dfs -dus / shows that file system size is merely 38GB. 3= 8GB number looks to be correct because we keep only few Hive tables and had= oop=92s /tmp (distributed cache and job outputs) in HDFS. All other data is= cleaned up. I cross-checked this from hadoop dfs -ls. Also I think that th= ere is no internal fragmentation because the files in our Hive tables are w= ell-chopped in ~50MB chunks. Here are last few lines of hadoop fsck / -file= s -blocks >>>=20 >>> Status: HEALTHY >>> Total size: 38086441332 B >>> Total dirs: 232 >>> Total files: 802 >>> Total blocks (validated): 796 (avg. block size 47847288 B) >>> Minimally replicated blocks: 796 (100.0 %) >>> Over-replicated blocks: 0 (0.0 %) >>> Under-replicated blocks: 6 (0.75376886 %) >>> Mis-replicated blocks: 0 (0.0 %) >>> Default replication factor: 2 >>> Average block replication: 3.0439699 >>> Corrupt blocks: 0 >>> Missing replicas: 6 (0.24762692 %) >>> Number of data-nodes: 9 >>> Number of racks: 1 >>> FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds >>>=20 >>>=20 >>> My question is that why disks of slaves are getting full even though th= ere are only few files in DFS? >>>=20 >>>=20 >>> _____________________________________________________________ >>> The information contained in this communication is intended solely for = the use of the individual or entity to whom it is addressed and others auth= orized to receive it. It may contain confidential or legally privileged inf= ormation. If you are not the intended recipient you are hereby notified tha= t any disclosure, copying, distribution or taking any action in reliance on= the contents of this information is strictly prohibited and may be unlawfu= l. If you have received this communication in error, please notify us immed= iately by responding to this email and then delete it from your system. The= firm is neither liable for the proper and complete transmission of the inf= ormation contained in this communication nor for any delay in its receipt. >>=20 >>=20 >>=20 >> _____________________________________________________________ >> The information contained in this communication is intended solely for t= he use of the individual or entity to whom it is addressed and others autho= rized to receive it. It may contain confidential or legally privileged info= rmation. If you are not the intended recipient you are hereby notified that= any disclosure, copying, distribution or taking any action in reliance on = the contents of this information is strictly prohibited and may be unlawful= . If you have received this communication in error, please notify us immedi= ately by responding to this email and then delete it from your system. The = firm is neither liable for the proper and complete transmission of the info= rmation contained in this communication nor for any delay in its receipt. >>=20 >>=20 >>=20 >> --=20 >> --Regards >> Sandeep Nemuri >=20 --=20 CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to= =20 which it is addressed and may contain information that is confidential,=20 privileged and exempt from disclosure under applicable law. If the reader= =20 of this message is not the intended recipient, you are hereby notified that= =20 any printing, copying, dissemination, distribution, disclosure or=20 forwarding of this communication is strictly prohibited. If you have=20 received this communication in error, please contact the sender immediately= =20 and delete it from your system. Thank You. --Apple-Mail=_C8A24CFF-7280-42BE-B38F-9A4392267D35 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 H= i Biswa, 

Are you sure that the replication factor of the files are th= ree? Please run a =91hadoop fsck / -blocks -files -locations=92 and see the= replication factor for each file.  Also, Post the configuration of&nb= sp;<name>dfs.datanode.du.reserved</name> and please check the real space presen= ted by a DataNode by running =91du -h=92

Thanks,
Rahman

On Apr 14, 2014, at 2:07 PM, Saumitra <saumitra.official@gmail.com> wrote:
Hello,

Biswanath, looks like we = have confusion in calculation, 1TB would be equal to 1024GB, not 114GB.


Sandeep, I checked log directory size = as well. Log directories are hardly in few GBs, I have configured log4j pro= perties so that logs won=92t be too large.

In our = slave machines, we have 450GB disk partition for hadoop logs and DFS. Over = there logs directory is < 10GBs and rest space is occupied by DFS. 10GB = partition is for /.

Let me quote my confusion poin= t once again:

Basically I wanted to point out disc= repancy in name node status page and hadoop= dfs -dus. In my case, earlier one reports DFS usage as 1TB and= later one reports it to be 35GB. What are the factors that can cause this = difference? And why is just 35GB data causing DFS to hit its limits?
<= /blockquote>
<= /div>


I am talking about name node status= page on 50070 port. Here is the screenshot of my name node status page

<Screen Shot 2014-04-15 at 2.07.19 am.png>= ;

As I understand, 'DFS used=92 is the spac= e taken by DFS, non-DFS used is spaces taken by non-DFS data like logs or o= ther local files from users. Namenode shows that DFS used is ~1TB but hadoop dfs -dus shows it to be ~38GB.



On 14-Apr-2014, at 12:33 pm, = Sandeep Nemuri <nhsandeep6@gmail= .com> wrote:

Please check your logs directory usage.



On Mon, Apr 14, 2014 at 12:08 PM, Biswajit Nayak <biswajit.nayak= @inmobi.com> wrote:
Whats the replication factor you have? I believe it shou= ld be 3. hadoop dus shows that disk usage without replication. While name n= ode ui page gives with replication. 

38gb * 3 =3D114gb ~ 1TB

~Biswa
-----oThe imp= ortant thing is not to stop questioning o-----


On Mon, Apr= 14, 2014 at 9:38 AM, Saumitra <saumitra.official@gmail.com&= gt; wrote:
Hi Biswajeet,

Non-df= s usage is ~100GB over the cluster. But still the number are nowhere near 1= TB. 

Basically I wanted to point out discrepa= ncy in name node status page and hadoop dfs -dus= . In my case, earlier one reports DFS usage as 1TB and later one rep= orts it to be 35GB. What are the factors that can cause this difference? An= d why is just 35GB data causing DFS to hit its limits?




On 14-= Apr-2014, at 8:31 am, Biswajit Nayak <biswajit.nayak@inmobi.com> wrote:

Hi Saumitra,

Could you please check the non-dfs usage. They also contribute to filling u= p the disk space. 
=


~Biswa
-----oThe important t= hing is not to stop questioning o-----


On Mon, Apr 14, 2014 at 1:24 AM, Saumitr= a <saumitra.official@gmail.com> wrote:
Hello,

We are runnin= g HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. We are using defa= ult HDFS block size.

We have noticed that disks of= slaves are almost full. From name node=92s status page (namenode:50070), w= e could see that disks of live nodes are 90% full and DFS Used% in cluster = summary page  is ~1TB.

However hadoop dfs -dus / shows that file system size is merely 38GB. 38GB number looks to be co= rrect because we keep only few Hive tables and hadoop=92s /tmp (distributed= cache and job outputs) in HDFS. All other data is cleaned up. I cross-chec= ked this from hadoop dfs -ls. Also I think that there is no internal fragment= ation because the files in our Hive tables are well-chopped in ~50MB chunks= . Here are last few lines of hadoop fsck / -files -blocks<= /font>

Status: HEALTHY
 Total size: 38086441332 B
 Total d= irs: 232
 Total = files: 802
 Total blocks (validated): <= /span>796 (avg. block size 47847288 B)
 Minimally replicated= blocks: 796 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 6 (0.75376886 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replicatio= n factor: 2
 Average block replication: = 3.0439699
 Corrupt blocks: 0
 Missing replicas: 6 (0.24762692 %)
 Number of data-nodes: 9
 Number of racks: = 1
FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milli= seconds


My question is that why disks of s= laves are getting full even though there are only few files in DFS?


______________________________= _______________________________
The information contained in this = communication is intended solely for the use of the individual or entity to= whom it is addressed and others authorized to receive it. It may contain c= onfidential or legally privileged information. If you are not the intended = recipient you are hereby notified that any disclosure, copying, distributio= n or taking any action in reliance on the contents of this information is s= trictly prohibited and may be unlawful. If you have received this communica= tion in error, please notify us immediately by responding to this email and= then delete it from your system. The firm is neither liable for the proper= and complete transmission of the information contained in this communicati= on nor for any delay in its receipt.



_____________________________________________________________
The information contained in this commun= ication is intended solely for the use of the individual or entity to whom = it is addressed and others authorized to receive it. It may contain confide= ntial or legally privileged information. If you are not the intended recipi= ent you are hereby notified that any disclosure, copying, distribution or t= aking any action in reliance on the contents of this information is strictl= y prohibited and may be unlawful. If you have received this communication i= n error, please notify us immediately by responding to this email and then = delete it from your system. The firm is neither liable for the proper and c= omplete transmission of the information contained in this communication nor= for any delay in its receipt.



--
= --Regards
  Sandeep Nemuri



CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u. --Apple-Mail=_C8A24CFF-7280-42BE-B38F-9A4392267D35--