hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Nemuri <nhsande...@gmail.com>
Subject Re: HDFS file system size issue
Date Mon, 14 Apr 2014 07:03:46 GMT
Please check your logs directory usage.



On Mon, Apr 14, 2014 at 12:08 PM, Biswajit Nayak
<biswajit.nayak@inmobi.com>wrote:

> Whats the replication factor you have? I believe it should be 3. hadoop
> dus shows that disk usage without replication. While name node ui page
> gives with replication.
>
> 38gb * 3 =114gb ~ 1TB
>
> ~Biswa
> -----oThe important thing is not to stop questioning o-----
>
>
> On Mon, Apr 14, 2014 at 9:38 AM, Saumitra <saumitra.official@gmail.com>wrote:
>
>> Hi Biswajeet,
>>
>> Non-dfs usage is ~100GB over the cluster. But still the number are
>> nowhere near 1TB.
>>
>> Basically I wanted to point out discrepancy in name node status page and hadoop
>> dfs -dus. In my case, earlier one reports DFS usage as 1TB and later one
>> reports it to be 35GB. What are the factors that can cause this difference?
>> And why is just 35GB data causing DFS to hit its limits?
>>
>>
>>
>>
>> On 14-Apr-2014, at 8:31 am, Biswajit Nayak <biswajit.nayak@inmobi.com>
>> wrote:
>>
>> Hi Saumitra,
>>
>> Could you please check the non-dfs usage. They also contribute to filling
>> up the disk space.
>>
>>
>>
>> ~Biswa
>> -----oThe important thing is not to stop questioning o-----
>>
>>
>> On Mon, Apr 14, 2014 at 1:24 AM, Saumitra <saumitra.official@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1.
>>> We are using default HDFS block size.
>>>
>>> We have noticed that disks of slaves are almost full. From name node’s
>>> status page (namenode:50070), we could see that disks of live nodes are 90%
>>> full and DFS Used% in cluster summary page  is ~1TB.
>>>
>>> However hadoop dfs -dus / shows that file system size is merely 38GB.
>>> 38GB number looks to be correct because we keep only few Hive tables and
>>> hadoop’s /tmp (distributed cache and job outputs) in HDFS. All other data
>>> is cleaned up. I cross-checked this from hadoop dfs -ls. Also I think
>>> that there is no internal fragmentation because the files in our Hive
>>> tables are well-chopped in ~50MB chunks. Here are last few lines of
>>> hadoop fsck / -files -blocks
>>>
>>> Status: HEALTHY
>>>  Total size: 38086441332 B
>>>  Total dirs: 232
>>>  Total files: 802
>>>  Total blocks (validated): 796 (avg. block size 47847288 B)
>>>  Minimally replicated blocks: 796 (100.0 %)
>>>  Over-replicated blocks: 0 (0.0 %)
>>>  Under-replicated blocks: 6 (0.75376886 %)
>>>  Mis-replicated blocks: 0 (0.0 %)
>>>  Default replication factor: 2
>>>  Average block replication: 3.0439699
>>>  Corrupt blocks: 0
>>>  Missing replicas: 6 (0.24762692 %)
>>>  Number of data-nodes: 9
>>>  Number of racks: 1
>>> FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds
>>>
>>>
>>> My question is that why disks of slaves are getting full even though
>>> there are only few files in DFS?
>>>
>>
>>
>> _____________________________________________________________
>> The information contained in this communication is intended solely for
>> the use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you have received this communication in error, please notify
>> us immediately by responding to this email and then delete it from your
>> system. The firm is neither liable for the proper and complete transmission
>> of the information contained in this communication nor for any delay in its
>> receipt.
>>
>>
>>
>
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>



-- 
--Regards
  Sandeep Nemuri

Mime
View raw message