Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Tue, 3 Feb 2015 05:27:35 +0000 (UTC)
From: "Srikanth Upputuri (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12729708.1406288203000.238211.1422941255438@Atlassian.JIRA>
In-Reply-To: <JIRA.12729708.1406288203000@Atlassian.JIRA>
References: <JIRA.12729708.1406288203000@Atlassian.JIRA>
 <JIRA.12729708.1406288203722@arcas>
Subject: [jira] [Commented] (HDFS-6753) When one the Disk is full and all
 the volumes configured are unhealthy , then Datanode is not considering it
 as failure and datanode process is not shutting down .
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302784#comment-14302784 ] 

Srikanth Upputuri commented on HDFS-6753:
-----------------------------------------

A write request to DN will first check for a disk volume with available space then proceeds to create a rbw file on it. The 'check disk error' is triggered when the rbw file can not be created. But if a volume with sufficient space could not be found, the request just throws an exception without initiating 'check disk error'. This is reasonable to expect because if there is no space available on any volume, DN may still be able to service read requests, so 'not enough space' is not a sufficient condition for DN shutdown. However, if after this condition all the volumes happen to become faulty, a subsequent read request will detect this condition and shutdown DN anyway. Therefore there is no need to fix this behavior.

> When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6753
>                 URL: https://issues.apache.org/jira/browse/HDFS-6753
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: J.Andreina
>            Assignee: Srikanth Upputuri
>
> Env Details :
> =============
> Cluster has 3 Datanode
> Cluster installed with "Rex" user
> dfs.datanode.failed.volumes.tolerated  = 3
> dfs.blockreport.intervalMsec                  = 18000
> dfs.datanode.directoryscan.interval     = 120
> DN_XX1.XX1.XX1.XX1 data dir                         = /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data
>  
>  
> /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - permission is denied ( hence DN considered the volume as failed )
>  
> Expected behavior is observed when disk is not full:
> ========================================
>  
> Step 1: Change the permissions of /mnt/tmp_Datanode to root
>  
> Step 2: Perform write operations ( DN detects that all Volume configured is failed and gets shutdown )
>  
> Scenario 1: 
> ===========
>  
> Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root
> Step 2 : Perform client write operations ( disk full exception is thrown , but Datanode is not getting shutdown ,  eventhough all the volume configured has failed)
>  
> {noformat}
>  
> 2014-07-21 14:10:52,814 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation  src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010
>  
> org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=4096 B) is less than the block size (=134217728 B).
>  
> at org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60)
>  
> {noformat}
>  
> Observations :
> ==============
> 1. Write operations does not shutdown Datanode , eventhough all the volume configured is failed ( When one of the disk is full and for all the disk permission is denied)
>  
> 2. Directory scannning fails , still DN is not getting shutdown
>  
>  
>  
> {noformat}
>  
> 2014-07-21 14:13:00,180 WARN org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured while compiling report: 
>  
> java.io.IOException: Invalid directory or I/O error occurred for dir: /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized
>  
> at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164)
>  
> at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596)
>  
> {noformat}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)