Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C14CA11800 for ; Fri, 25 Jul 2014 11:38:39 +0000 (UTC) Received: (qmail 38381 invoked by uid 500); 25 Jul 2014 11:38:38 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 38241 invoked by uid 500); 25 Jul 2014 11:38:38 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 37978 invoked by uid 99); 25 Jul 2014 11:38:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jul 2014 11:38:38 +0000 Date: Fri, 25 Jul 2014 11:38:38 +0000 (UTC) From: "J.Andreina (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down . MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 J.Andreina created HDFS-6753: -------------------------------- Summary: When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down . Key: HDFS-6753 URL: https://issues.apache.org/jira/browse/HDFS-6753 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Env Details : ============= Cluster has 3 Datanode Cluster installed with "Rex" user dfs.datanode.failed.volumes.tolerated = 3 dfs.blockreport.intervalMsec = 18000 dfs.datanode.directoryscan.interval = 120 DN_XX1.XX1.XX1.XX1 data dir = /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - permission is denied ( hence DN considered the volume as failed ) Expected behavior is observed when disk is not full: ======================================== Step 1: Change the permissions of /mnt/tmp_Datanode to root Step 2: Perform write operations ( DN detects that all Volume configured is failed and gets shutdown ) Scenario 1: =========== Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root Step 2 : Perform client write operations ( disk full exception is thrown , but Datanode is not getting shutdown , eventhough all the volume configured has failed) {noformat} 2014-07-21 14:10:52,814 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010 org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=4096 B) is less than the block size (=134217728 B). at org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60) {noformat} Observations : ============== 1. Write operations does not shutdown Datanode , eventhough all the volume configured is failed ( When one of the disk is full and for all the disk permission is denied) 2. Directory scannning fails , still DN is not getting shutdown {noformat} 2014-07-21 14:13:00,180 WARN org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured while compiling report: java.io.IOException: Invalid directory or I/O error occurred for dir: /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)