Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0DCAD17B84 for ; Fri, 1 May 2015 08:09:07 +0000 (UTC) Received: (qmail 25665 invoked by uid 500); 1 May 2015 08:09:06 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 25598 invoked by uid 500); 1 May 2015 08:09:06 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 25586 invoked by uid 99); 1 May 2015 08:09:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 May 2015 08:09:06 +0000 Date: Fri, 1 May 2015 08:09:06 +0000 (UTC) From: "Hari Sekhon (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-8299) HDFS reporting missing blocks when they are actually present due to read-only filesystem MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522917#comment-14522917 ] Hari Sekhon commented on HDFS-8299: ----------------------------------- To clarify, a read-only filesystem should not prevent the blocks from being included in the block report to the NameNode and reported as existing, it should merely prevent new block writes to that partition until resolved. > HDFS reporting missing blocks when they are actually present due to read-only filesystem > ---------------------------------------------------------------------------------------- > > Key: HDFS-8299 > URL: https://issues.apache.org/jira/browse/HDFS-8299 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.6.0 > Environment: HDP 2.2 > Reporter: Hari Sekhon > Priority: Critical > Attachments: datanode.log > > > Fsck shows missing blocks when the blocks can be found on a datanode's filesystem and the datanode has been restarted to try to get it to recognize that the blocks are indeed present and hence report them to the NameNode in a block report. > Fsck output showing an example "missing" block: > {code}/apps/hive/warehouse/.db/someTable/000000_0: CORRUPT blockpool BP-120244285--1417023863606 block blk_1075202330 > MISSING 1 blocks of total size 3260848 B > 0. BP-120244285--1417023863606:blk_1075202330_1484191 len=3260848 MISSING!{code} > The block is definitely present on more than one datanode however, here is the output from one of them that I restarted to try to get it to report the block to the NameNode: > {code}# ll /archive1/dn/current/BP-120244285--1417023863606/current/finalized/subdir22/subdir73/blk_1075202330* > -rw-r--r-- 1 hdfs 499 3260848 Apr 27 15:02 /archive1/dn/current/BP-120244285--1417023863606/current/finalized/subdir22/subdir73/blk_1075202330 > -rw-r--r-- 1 hdfs 499 25483 Apr 27 15:02 /archive1/dn/current/BP-120244285--1417023863606/current/finalized/subdir22/subdir73/blk_1075202330_1484191.meta{code} > It's worth noting that this is on HDFS tiered storage on an archive tier going to a networked block device that may have become temporarily unavailable but is available now. See also feature request HDFS-8297 for online rescan to not have to go around restarting datanodes. > It turns out in the datanode log (that I am attaching) this is because the datanode fails to get a write lock on the filesystem. I think it would be better to be able to read-only those blocks however, since this way causes client visible data unavailability when the data could in fact be read. > {code}2015-04-30 14:11:08,235 WARN datanode.DataNode (DataNode.java:checkStorageLocations(2284)) - Invalid dfs.datanode.data.dir /archive1/dn : > org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /archive1/dn > at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:193) > at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:157) > at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2239) > at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2281) > at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2263) > at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155) > at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202) > at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378) > at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:78) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)