Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 377F118532 for ; Tue, 8 Mar 2016 20:15:44 +0000 (UTC) Received: (qmail 11806 invoked by uid 500); 8 Mar 2016 20:15:41 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 11589 invoked by uid 500); 8 Mar 2016 20:15:41 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 11452 invoked by uid 99); 8 Mar 2016 20:15:41 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2016 20:15:41 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8012F2C1F5C for ; Tue, 8 Mar 2016 20:15:41 +0000 (UTC) Date: Tue, 8 Mar 2016 20:15:41 +0000 (UTC) From: "Wei-Chiu Chuang (JIRA)" To: hdfs-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HDFS-9923) Datanode disk failure handling should be improved (consistently?) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Wei-Chiu Chuang created HDFS-9923: ------------------------------------- Summary: Datanode disk failure handling should be improved (consistently?) Key: HDFS-9923 URL: https://issues.apache.org/jira/browse/HDFS-9923 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Wei-Chiu Chuang Disk failures are hard to handle. This JIRA is created to discuss/improve disk failure handling in a better/consistent manner. For one thing, disks can fail in multiple different ways: the hardware might be failing, disk space is full, checksum error ... For others, hardware abstracts out the details, so it's hard for software to handle them. There are currently three disk check mechanisms in HDFS, as far as I know: {{BlockScanner}}, {{BlockPoolSlice#checkDirs}} and {{DU}}. Disk errors are handled differently. This JIRA is more focused on {{DU}} error handling. {{DU}} may emit errors like this: {noformat} 2016-02-18 02:23:36,224 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught exception while scanning /data/8/dfs/dn/current. Will throw later. ExitCodeException exitCode=1: du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_ 1088686909': Input/output error du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_1088686909_14954023.meta': Inp ut/output error {noformat} I found {{DU}} errors are not handled consistently while working on HDFS-9908 (Datanode should tolerate disk scan failure during NN handshake), and it all depends on who catches the exception. For example, * if DU returns error during NN handshake, DN will not be able to join the cluster at all(HDFS-9908); * however, if the same exception is caught in {{BlockPoolSlice#saveDfsUsed}}, data node will only log a warning and do nothing (HDFS-5498). * in some cases, the exception handler invokes {{BlockPoolSlice#checkDirs}}, but since it only checks three directories, it is very unlikely to find the files that have the error. {{BlockReceiver#(constructor)}} So my ask is: should the error be handled in a consistent manner? Should data node report to the name nodes about the disk failures (this is the BlockScanner approach), and should data node takes this volume offline automatically if DU returns an error? (this is the {{checkDirs}} approach) -- This message was sent by Atlassian JIRA (v6.3.4#6332)