hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9530) huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
Date Wed, 08 Jun 2016 22:55:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321609#comment-15321609
] 

Arpit Agarwal commented on HDFS-9530:
-------------------------------------

I took a deeper look at the reservation code. It was painful to see my own lack of thoroughness.

I mostly agree with Brahma's analysis. Since reservation is done only for replicas created
via BlockReceiver, there are a couple of potential culprits where the reservation could be
leaked:
* Failure in {{DataXceiver#writeBlock}} after creating the BlockReceiver.
* BlockReceiver receives an unchecked exception after reserving.

Also agree with [~vinayrpet] that releasing via invalidate is the safer option although it
can lead to the reserved space hanging around longer. 

Do we agree on the following summary of the contract for when space should be reserved and
released?
# Space is reserved only when the on-disk block file is successfully created for an rbw/temporary
replica. This is verifiably true in FsDatasetImpl#createTemporary and FsDatasetImpl#createRbw
barring OOM when the ReplicaInPipeline/ReplicaBeingWritten is allocated.
# Space continues to be reserved as long as there is an rbw/temporary in the volumeMap.
# Space must be released either when the replica is finalized or it is invalidated. FsDatasetImpl#finalizeReplica
handles the finalize case. Fixing invalidate would close the remaining gap.
## Space may be released earlier if a failure is detected earlier e.g. exception in BlockReceiver
which we handle today.
## Space may also be released incrementally when some bytes are written to disk which is handled
via ReplicaInPipeline#setBytesAcked.

Thanks again for the detailed analysis on this one Brahma, [~raviprak] and Vinay. Nice work.

> huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
> -----------------------------------------
>
>                 Key: HDFS-9530
>                 URL: https://issues.apache.org/jira/browse/HDFS-9530
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Fei Hui
>         Attachments: HDFS-9530-01.patch
>
>
> i think there are bugs in HDFS
> ===============================================================================
> here is config
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>
>         file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2
>     </value>
>   </property>
> here is dfsadmin report 
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 238604832768 (222.22 GB)
> DFS Remaining: 215772954624 (200.95 GB)
> DFS Used: 22831878144 (21.26 GB)
> DFS Used%: 9.57%
> Under replicated blocks: 4
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -------------------------------------------------
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7190958080 (6.70 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72343986176 (67.38 GB)
> DFS Used%: 8.96%
> DFS Remaining%: 90.14%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:02 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7219073024 (6.72 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72315871232 (67.35 GB)
> DFS Used%: 9.00%
> DFS Remaining%: 90.11%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 8421847040 (7.84 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 71113097216 (66.23 GB)
> DFS Used%: 10.49%
> DFS Remaining%: 88.61%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> ================================================================================
> when running hive job , dfsadmin report as follows
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 108266011136 (100.83 GB)
> DFS Remaining: 80078416384 (74.58 GB)
> DFS Used: 28187594752 (26.25 GB)
> DFS Used%: 26.04%
> Under replicated blocks: 7
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -------------------------------------------------
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9015627776 (8.40 GB)
> Non DFS Used: 44303742464 (41.26 GB)
> DFS Remaining: 26937047552 (25.09 GB)
> DFS Used%: 11.23%
> DFS Remaining%: 33.56%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 693
> Last contact: Wed Dec 09 15:37:35 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9163116544 (8.53 GB)
> Non DFS Used: 47895897600 (44.61 GB)
> DFS Remaining: 23197403648 (21.60 GB)
> DFS Used%: 11.42%
> DFS Remaining%: 28.90%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 750
> Last contact: Wed Dec 09 15:37:36 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 10008850432 (9.32 GB)
> Non DFS Used: 40303602176 (37.54 GB)
> DFS Remaining: 29943965184 (27.89 GB)
> DFS Used%: 12.47%
> DFS Remaining%: 37.31%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 632
> Last contact: Wed Dec 09 15:37:36 CST 2015
> =========================================================================
> but, df output is as follows on worker-1
> [hadoop@worker-1 ~]$ df
> Filesystem     1K-blocks    Used Available Use% Mounted on
> /dev/xvda1      20641404 4229676  15363204  22% /
> tmpfs            8165456       0   8165456   0% /dev/shm
> /dev/xvdc       20642428 2596920  16996932  14% /mnt/disk3
> /dev/xvdb       20642428 2692228  16901624  14% /mnt/disk4
> /dev/xvdd       20642428 2445852  17148000  13% /mnt/disk2
> /dev/xvde       20642428 2909764  16684088  15% /mnt/disk1
> df output conflitcs with dfsadmin report
> any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message