hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brahma Reddy Battula (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9530) huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
Date Fri, 08 Apr 2016 02:16:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231505#comment-15231505
] 

Brahma Reddy Battula commented on HDFS-9530:
--------------------------------------------

To Make it simple for analysis,

1. Reservation happens only when the block is being received using BlockReceiver. No other
places reservation happens, so no need to release as well.
2. BlockReceiver constructor have a try-catch block where it will release all the bytes reserved,
if there is any exceptions after reserving.
3. BlockReceiver#receiveBlock() have the try-catch block where it will release all the bytes
reserved if there is any exceptions during the receiving process.
4. During successful receiving of packets, {{ReplicaInPipeline#setBytesAcked(..)}} will be
called by {{PacketResponder}}
5. Once the block is completely received, {{FsDataSetImpl#finalizeReplica(..)} will release
all the remaining reserved bytes.

Only place left is in {{DataXceiver#writeBlock()}}, exception can happen after creation of
{{BlockReceiver}} and before calling {{BlockReceiver#receiveBlock()}}, if failed to connect
to Mirror nodes.
Only in this case, bytes will not be released. But a ReplicaInfo instance will be already
created in ReplicaMap.

Here, if the client re-creates the pipeline with same blockId, then same ReplicaInfo instance
will be used, So no extra reservation happens. This can be verified using the same testcase
as patch, but failing the pipeline for append, where abandonBlock will not be called, and
pipeline will be recovered for same block.

But in case of fresh block creation, block will be abandoned and fresh block with new pipeline
will be requested.
The old block created at Datanode will be eventually be deleted, BUT reserved space was never
released. That's why you are not seeing many RBW blocks in RBW directory, but reserved space
went on to accumulate > 1TB.

Though I have given two approaches, #1, releasing reserved bytes while deletion of ReplicaInPipeline
instances, will cover all hidden cases, if any, as well.

Hope this helps.

> huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
> -----------------------------------------
>
>                 Key: HDFS-9530
>                 URL: https://issues.apache.org/jira/browse/HDFS-9530
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Fei Hui
>         Attachments: HDFS-9530-01.patch
>
>
> i think there are bugs in HDFS
> ===============================================================================
> here is config
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>
>         file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2
>     </value>
>   </property>
> here is dfsadmin report 
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 238604832768 (222.22 GB)
> DFS Remaining: 215772954624 (200.95 GB)
> DFS Used: 22831878144 (21.26 GB)
> DFS Used%: 9.57%
> Under replicated blocks: 4
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -------------------------------------------------
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7190958080 (6.70 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72343986176 (67.38 GB)
> DFS Used%: 8.96%
> DFS Remaining%: 90.14%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:02 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7219073024 (6.72 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72315871232 (67.35 GB)
> DFS Used%: 9.00%
> DFS Remaining%: 90.11%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 8421847040 (7.84 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 71113097216 (66.23 GB)
> DFS Used%: 10.49%
> DFS Remaining%: 88.61%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> ================================================================================
> when running hive job , dfsadmin report as follows
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 108266011136 (100.83 GB)
> DFS Remaining: 80078416384 (74.58 GB)
> DFS Used: 28187594752 (26.25 GB)
> DFS Used%: 26.04%
> Under replicated blocks: 7
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -------------------------------------------------
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9015627776 (8.40 GB)
> Non DFS Used: 44303742464 (41.26 GB)
> DFS Remaining: 26937047552 (25.09 GB)
> DFS Used%: 11.23%
> DFS Remaining%: 33.56%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 693
> Last contact: Wed Dec 09 15:37:35 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9163116544 (8.53 GB)
> Non DFS Used: 47895897600 (44.61 GB)
> DFS Remaining: 23197403648 (21.60 GB)
> DFS Used%: 11.42%
> DFS Remaining%: 28.90%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 750
> Last contact: Wed Dec 09 15:37:36 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 10008850432 (9.32 GB)
> Non DFS Used: 40303602176 (37.54 GB)
> DFS Remaining: 29943965184 (27.89 GB)
> DFS Used%: 12.47%
> DFS Remaining%: 37.31%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 632
> Last contact: Wed Dec 09 15:37:36 CST 2015
> =========================================================================
> but, df output is as follows on worker-1
> [hadoop@worker-1 ~]$ df
> Filesystem     1K-blocks    Used Available Use% Mounted on
> /dev/xvda1      20641404 4229676  15363204  22% /
> tmpfs            8165456       0   8165456   0% /dev/shm
> /dev/xvdc       20642428 2596920  16996932  14% /mnt/disk3
> /dev/xvdb       20642428 2692228  16901624  14% /mnt/disk4
> /dev/xvdd       20642428 2445852  17148000  13% /mnt/disk2
> /dev/xvde       20642428 2909764  16684088  15% /mnt/disk1
> df output conflitcs with dfsadmin report
> any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message