hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-8299) HDFS reporting missing blocks when they are actually present due to read-only filesystem
Date Thu, 30 Apr 2015 13:33:06 GMT
Hari Sekhon created HDFS-8299:
---------------------------------

             Summary: HDFS reporting missing blocks when they are actually present due to
read-only filesystem
                 Key: HDFS-8299
                 URL: https://issues.apache.org/jira/browse/HDFS-8299
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.6.0
         Environment: Fsck shows missing blocks when the blocks can be found on a datanode's
filesystem and the datanode has been restarted to try to get it to recognize that the blocks
are indeed present and hence report them to the NameNode in a block report.

Fsck output showing an example "missing" block:
{code}/apps/hive/warehouse/<custom_scrubbed>.db/someTable/000000_0: CORRUPT blockpool
BP-120244285-<ip>-1417023863606 block blk_1075202330
 MISSING 1 blocks of total size 3260848 B
0. BP-120244285-<ip>-1417023863606:blk_1075202330_1484191 len=3260848 MISSING!{code}
The block is definitely present on more than one datanode however, here is the output from
one of them that I restarted to try to get it to report the block to the NameNode:
{code}# ll /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330*
-rw-r--r-- 1 hdfs 499 3260848 Apr 27 15:02 /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330
-rw-r--r-- 1 hdfs 499   25483 Apr 27 15:02 /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330_1484191.meta{code}
It's worth noting that this is on HDFS tiered storage on an archive tier going to a networked
block device that may have become temporarily unavailable but is available now. See also feature
request HDFS-8297 for online rescan to not have to go around restarting datanodes.

It turns out in the datanode log (that I am attaching) this is because the datanode fails
to get a write lock on the filesystem. I think it would be better to be able to read-only
those blocks however, since this way causes client visible data unavailability when the data
could in fact be read.

{code}2015-04-30 14:11:08,235 WARN  datanode.DataNode (DataNode.java:checkStorageLocations(2284))
- Invalid dfs.datanode.data.dir /archive1/dn :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /archive1/dn
        at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:193)
        at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:157)
        at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2239)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2281)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2263)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
        at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:78)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
{code}

Hari Sekhon
http://www.linkedin.com/in/harisekhon
            Reporter: Hari Sekhon
            Priority: Critical
         Attachments: datanode.log





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message