Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 1 May 2015 08:09:06 +0000 (UTC)
From: "Hari Sekhon (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12826300.1430400734000.44130.1430467746639@Atlassian.JIRA>
In-Reply-To: <JIRA.12826300.1430400734000@Atlassian.JIRA>
References: <JIRA.12826300.1430400734000@Atlassian.JIRA>
 <JIRA.12826300.1430400734613@arcas>
Subject: [jira] [Commented] (HDFS-8299) HDFS reporting missing blocks when
 they are actually present due to read-only filesystem
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522917#comment-14522917 ] 

Hari Sekhon commented on HDFS-8299:
-----------------------------------

To clarify, a read-only filesystem should not prevent the blocks from being included in the block report to the NameNode and reported as existing, it should merely prevent new block writes to that partition until resolved.

> HDFS reporting missing blocks when they are actually present due to read-only filesystem
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-8299
>                 URL: https://issues.apache.org/jira/browse/HDFS-8299
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>         Environment: HDP 2.2
>            Reporter: Hari Sekhon
>            Priority: Critical
>         Attachments: datanode.log
>
>
> Fsck shows missing blocks when the blocks can be found on a datanode's filesystem and the datanode has been restarted to try to get it to recognize that the blocks are indeed present and hence report them to the NameNode in a block report.
> Fsck output showing an example "missing" block:
> {code}/apps/hive/warehouse/<custom_scrubbed>.db/someTable/000000_0: CORRUPT blockpool BP-120244285-<ip>-1417023863606 block blk_1075202330
>  MISSING 1 blocks of total size 3260848 B
> 0. BP-120244285-<ip>-1417023863606:blk_1075202330_1484191 len=3260848 MISSING!{code}
> The block is definitely present on more than one datanode however, here is the output from one of them that I restarted to try to get it to report the block to the NameNode:
> {code}# ll /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330*
> -rw-r--r-- 1 hdfs 499 3260848 Apr 27 15:02 /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330
> -rw-r--r-- 1 hdfs 499   25483 Apr 27 15:02 /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330_1484191.meta{code}
> It's worth noting that this is on HDFS tiered storage on an archive tier going to a networked block device that may have become temporarily unavailable but is available now. See also feature request HDFS-8297 for online rescan to not have to go around restarting datanodes.
> It turns out in the datanode log (that I am attaching) this is because the datanode fails to get a write lock on the filesystem. I think it would be better to be able to read-only those blocks however, since this way causes client visible data unavailability when the data could in fact be read.
> {code}2015-04-30 14:11:08,235 WARN  datanode.DataNode (DataNode.java:checkStorageLocations(2284)) - Invalid dfs.datanode.data.dir /archive1/dn :
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /archive1/dn
>         at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:193)
>         at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
>         at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:157)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2239)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2281)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2263)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
>         at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:78)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)