hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guru Prateek Pinnadhari <gpinnadh...@zaloni.com>
Subject Getting HDFS dir/file/block counts from CLI
Date Fri, 24 Feb 2017 22:51:46 GMT
Hi,

1.

I noticed that the output of "hadoop fs -count /" is way off when compared
to the metrics reported on Namenode JMX or NameNode UI


Stats from Namenode UI: Total files and dirs = ~30 million



Stats from CLI: Total files and dirs = ~4.2 million

$ hadoop fs -count -v -h /

   DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

     396.5 K        3.9 M            240.6 G /


Why is there a difference of over 25+ files+directories and a size
difference of 664 TB vs 240 GB?


I found that "hadoop fs -count" fetches info from the specified path's "
org.apache.hadoop.fs.ContentSummary
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.1/org/apache/hadoop/fs/ContentSummary.java#ContentSummary>"
class which seems to track variables called directoryCount, fileCount,
spaceConsumed etc.

Are these variable always up-to-date? Doesnt seem like it. Can I run some
command to force update?


2.

How can I find the total number of blocks used from CLI? (for entire HDFS)

I think I can get it from "hdfs fsck" but I'd rather not run that as it
seems to be very resource intensive and often times out / errors out on a
big cluster.




-- 
Thanks,
Guru

Mime
View raw message