hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8299) HDFS reporting missing blocks when they are actually present due to read-only filesystem
Date Fri, 01 May 2015 08:07:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522913#comment-14522913

Hari Sekhon commented on HDFS-8299:

Yes that's what I figured too, but I'm suggesting that just because a write lock cannot be
obtained doesn't mean the blocks can't be read when they are clearly there.

Instead of causing user visible data unavailability it should still provide access to the
data with any new writes going to other nodes/partitions. It would also need to be reported
that the partition is in read-only state (due to some underlying ext4 filesystem issue) in
the NameNode jsp / dfsadmin -report etc.

> HDFS reporting missing blocks when they are actually present due to read-only filesystem
> ----------------------------------------------------------------------------------------
>                 Key: HDFS-8299
>                 URL: https://issues.apache.org/jira/browse/HDFS-8299
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>         Environment: HDP 2.2
>            Reporter: Hari Sekhon
>            Priority: Critical
>         Attachments: datanode.log
> Fsck shows missing blocks when the blocks can be found on a datanode's filesystem and
the datanode has been restarted to try to get it to recognize that the blocks are indeed present
and hence report them to the NameNode in a block report.
> Fsck output showing an example "missing" block:
> {code}/apps/hive/warehouse/<custom_scrubbed>.db/someTable/000000_0: CORRUPT blockpool
BP-120244285-<ip>-1417023863606 block blk_1075202330
>  MISSING 1 blocks of total size 3260848 B
> 0. BP-120244285-<ip>-1417023863606:blk_1075202330_1484191 len=3260848 MISSING!{code}
> The block is definitely present on more than one datanode however, here is the output
from one of them that I restarted to try to get it to report the block to the NameNode:
> {code}# ll /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330*
> -rw-r--r-- 1 hdfs 499 3260848 Apr 27 15:02 /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330
> -rw-r--r-- 1 hdfs 499   25483 Apr 27 15:02 /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330_1484191.meta{code}
> It's worth noting that this is on HDFS tiered storage on an archive tier going to a networked
block device that may have become temporarily unavailable but is available now. See also feature
request HDFS-8297 for online rescan to not have to go around restarting datanodes.
> It turns out in the datanode log (that I am attaching) this is because the datanode fails
to get a write lock on the filesystem. I think it would be better to be able to read-only
those blocks however, since this way causes client visible data unavailability when the data
could in fact be read.
> {code}2015-04-30 14:11:08,235 WARN  datanode.DataNode (DataNode.java:checkStorageLocations(2284))
- Invalid dfs.datanode.data.dir /archive1/dn :
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /archive1/dn
>         at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:193)
>         at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
>         at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:157)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2239)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2281)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2263)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
>         at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:78)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}

This message was sent by Atlassian JIRA

View raw message