hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pol, Daniel (BigData)" <daniel....@hpe.com>
Subject RE: HDFS - How to delete orphaned blocks
Date Fri, 24 Mar 2017 16:19:01 GMT
Tried that already. Went even up to 1.0f. Tried also different values for dfs.block.invalidate.limit
without impact
Hoping for something similar to “expunge” command that would clear HDFS of all orphaned

From: Harsh J [mailto:harsh@cloudera.com]
Sent: Friday, March 24, 2017 11:16 AM
To: Pol, Daniel (BigData) <daniel.pol@hpe.com>; user@hadoop.apache.org
Subject: Re: HDFS - How to delete orphaned blocks

The rate of deletion of DN blocks is throttled via dfs.namenode.invalidate.work.pct.per.iteration
(documented at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml#dfs.namenode.invalidate.work.pct.per.iteration).
If your problem is the rate and your usage is such that you generate and delete a lot of data
quick, you can consider increasing the percentage represented by this value, and restart your

P.s. Going too high may require raising heap spaces, so keep an eye out on JVM heap space
usage across NN and DNs after raise.

On Fri, 24 Mar 2017 at 21:42 Pol, Daniel (BigData) <daniel.pol@hpe.com<mailto:daniel.pol@hpe.com>>
Hi !

Is there a way to delete “orphaned” blocks ? I see this happening quite often it I change
the HDFS storage policy and recreate data or if a datanode fails and data on it its “old”
but not old enough. After a few days it goes away by itself but I need a way to manually trigger
it or make it faster. Right now I have to write scripts to detect the orphaned blocks and
delete them manually outside Hadoop or reformat my HDFS.

I get into this situation where ‘dfs du’ shows not much space in use.
sudo -u hdfs bin/hdfs dfs -du -h /
8.1 G    24.2 G   /app-logs
867      2.5 K    /benchmarks
2.0 G    6.0 G    /mr-history
762      2.2 K    /system
100.4 M  251.2 M  /user

I have nothing in Trash and no Snapshots but my dfsadmin report show TBs of data in DFS used:
Name:<> (m07dn06)
Hostname: m07dn06
Decommission Status : Normal
Configured Capacity: 108579574620160 (98.75 TB)
DFS Used: 1756550197248 (1.60 TB)
Non DFS Used: 0 (0 B)
DFS Remaining: 106822554660864 (97.15 TB)
DFS Used%: 1.62%
DFS Remaining%: 98.38%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Fri Mar 24 12:57:07 CDT 2017

Namenode logs show many block reports with invalidatedBlocks:
2017-03-24 12:49:37,625 INFO  BlockStateChange (BlockManager.java:processReport(2354)) - BLOCK*
processReport 0x19c92e070e3c2301: from storage DS-41ba227f-2a3e-45ac-b28c-1504e51d7cc2 node
DatanodeRegistration(<>, datanodeUuid=5be84f90-ba9c-4c85-94fd-e4d20369c4e4,
infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-57;cid=CID-ca8849f2-d722-45de-9848-ad50eeeabcf7;nsid=1923307298;c=1487788944154),
blocks: 498, hasStaleStorage: false, processing time: 0 msecs, invalidatedBlocks: 65

Have a nice day,
View raw message