hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takanobu Asanuma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.
Date Mon, 21 Nov 2016 03:38:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682397#comment-15682397
] 

Takanobu Asanuma commented on HDFS-10815:
-----------------------------------------

Thanks for reporting this issue, [~ademu].

I think this bug (and HDFS-10775) might have already been solved by HDFS-10858. Before fixing
the bug, when datanodes sent full block reports which contained ec blocks and replicated blocks,
namenode sometimes handled it wrongly. Eventually, it stopped the recovery process.

Please try to do the test with the latest trunk branch.

> The state of the EC file is erroneously recognized when you restart the NameNode.
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-10815
>                 URL: https://issues.apache.org/jira/browse/HDFS-10815
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>         Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as "RS-DEFAULT-3-2-64k"
>            Reporter: Eisuke Umeda
>
> After carrying out an examination in the following procedures, an EC files came to be
recognized as corrupt files.
> These files were able to get in "hdfs dfs -get".
> NameNode might be causing the false recognition.
> DataNodes: datanode[1-5]
> Rack awareness: not set
> Copy target files: /tmp/tpcds-generate/25/store_sales/*
> {code}
> $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales
> Found 25 items
> -rw-r--r--   0 root supergroup  399430918 2016-08-16 15:11 /tmp/tpcds-generate/25/store_sales/data-m-00000
> -rw-r--r--   0 root supergroup  399054598 2016-08-16 15:11 /tmp/tpcds-generate/25/store_sales/data-m-00001
> -rw-r--r--   0 root supergroup  399329373 2016-08-16 15:11 /tmp/tpcds-generate/25/store_sales/data-m-00002
> -rw-r--r--   0 root supergroup  399528459 2016-08-16 15:11 /tmp/tpcds-generate/25/store_sales/data-m-00003
> -rw-r--r--   0 root supergroup  399329624 2016-08-16 15:11 /tmp/tpcds-generate/25/store_sales/data-m-00004
> -rw-r--r--   0 root supergroup  399085924 2016-08-16 15:11 /tmp/tpcds-generate/25/store_sales/data-m-00005
> -rw-r--r--   0 root supergroup  399337384 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00006
> -rw-r--r--   0 root supergroup  399199458 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00007
> -rw-r--r--   0 root supergroup  399679096 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00008
> -rw-r--r--   0 root supergroup  399440431 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00009
> -rw-r--r--   0 root supergroup  399403931 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00010
> -rw-r--r--   0 root supergroup  399472465 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00011
> -rw-r--r--   0 root supergroup  399451784 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00012
> -rw-r--r--   0 root supergroup  399240168 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00013
> -rw-r--r--   0 root supergroup  399370507 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00014
> -rw-r--r--   0 root supergroup  399633351 2016-08-16 15:12 /tmp/tpcds-generate/25/store_sales/data-m-00015
> -rw-r--r--   0 root supergroup  396532952 2016-08-16 15:13 /tmp/tpcds-generate/25/store_sales/data-m-00016
> -rw-r--r--   0 root supergroup  396258715 2016-08-16 15:13 /tmp/tpcds-generate/25/store_sales/data-m-00017
> -rw-r--r--   0 root supergroup  396382486 2016-08-16 15:13 /tmp/tpcds-generate/25/store_sales/data-m-00018
> -rw-r--r--   0 root supergroup  399016456 2016-08-16 15:13 /tmp/tpcds-generate/25/store_sales/data-m-00019
> -rw-r--r--   0 root supergroup  399465745 2016-08-16 15:13 /tmp/tpcds-generate/25/store_sales/data-m-00020
> -rw-r--r--   0 root supergroup  399208235 2016-08-16 15:13 /tmp/tpcds-generate/25/store_sales/data-m-00021
> -rw-r--r--   0 root supergroup  399198296 2016-08-16 15:13 /tmp/tpcds-generate/25/store_sales/data-m-00022
> -rw-r--r--   0 root supergroup  399599711 2016-08-16 15:13 /tmp/tpcds-generate/25/store_sales/data-m-00023
> -rw-r--r--   0 root supergroup  395150855 2016-08-16 15:13 /tmp/tpcds-generate/25/store_sales/data-m-00024
> {code}
> NameNodes:
>   namenode1(active)
>   namenode2(standby)
> The directory which there is "Under-erasure-coded block groups": /tmp/tpcds-generate/test
> {code}
> $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test
> ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, Schema=[ECSchema=[Codec=rs-default, numDataUnits=3,
numParityUnits=2]], CellSize=65536 ]
> {code}
> The following is the steps to reproduce:
> 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test
> 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode
> 3) start a process of datanode1 two minutes later
> 4) carry out hdfs fsck and confirm that Under-Replicated Blocks occurred
> 5) wait until Under-Replicated Blocks becomes 0
> 6) (namenode1) /etc/init.d/hadoop-hdfs-namenode restart
> 7) (namenode2) /etc/init.d/hadoop-hdfs-namenode restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message