hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8299) HDFS reporting missing blocks when they are actually present due to read-only filesystem
Date Fri, 01 May 2015 08:09:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522917#comment-14522917

Hari Sekhon commented on HDFS-8299:

To clarify, a read-only filesystem should not prevent the blocks from being included in the
block report to the NameNode and reported as existing, it should merely prevent new block
writes to that partition until resolved.

> HDFS reporting missing blocks when they are actually present due to read-only filesystem
> ----------------------------------------------------------------------------------------
>                 Key: HDFS-8299
>                 URL: https://issues.apache.org/jira/browse/HDFS-8299
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>         Environment: HDP 2.2
>            Reporter: Hari Sekhon
>            Priority: Critical
>         Attachments: datanode.log
> Fsck shows missing blocks when the blocks can be found on a datanode's filesystem and
the datanode has been restarted to try to get it to recognize that the blocks are indeed present
and hence report them to the NameNode in a block report.
> Fsck output showing an example "missing" block:
> {code}/apps/hive/warehouse/<custom_scrubbed>.db/someTable/000000_0: CORRUPT blockpool
BP-120244285-<ip>-1417023863606 block blk_1075202330
>  MISSING 1 blocks of total size 3260848 B
> 0. BP-120244285-<ip>-1417023863606:blk_1075202330_1484191 len=3260848 MISSING!{code}
> The block is definitely present on more than one datanode however, here is the output
from one of them that I restarted to try to get it to report the block to the NameNode:
> {code}# ll /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330*
> -rw-r--r-- 1 hdfs 499 3260848 Apr 27 15:02 /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330
> -rw-r--r-- 1 hdfs 499   25483 Apr 27 15:02 /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330_1484191.meta{code}
> It's worth noting that this is on HDFS tiered storage on an archive tier going to a networked
block device that may have become temporarily unavailable but is available now. See also feature
request HDFS-8297 for online rescan to not have to go around restarting datanodes.
> It turns out in the datanode log (that I am attaching) this is because the datanode fails
to get a write lock on the filesystem. I think it would be better to be able to read-only
those blocks however, since this way causes client visible data unavailability when the data
could in fact be read.
> {code}2015-04-30 14:11:08,235 WARN  datanode.DataNode (DataNode.java:checkStorageLocations(2284))
- Invalid dfs.datanode.data.dir /archive1/dn :
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /archive1/dn
>         at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:193)
>         at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
>         at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:157)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2239)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2281)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2263)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
>         at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:78)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}

This message was sent by Atlassian JIRA

View raw message