Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E445511A6E for ; Thu, 5 Jun 2014 07:33:01 +0000 (UTC) Received: (qmail 33795 invoked by uid 500); 5 Jun 2014 07:33:01 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 33736 invoked by uid 500); 5 Jun 2014 07:33:01 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 33725 invoked by uid 99); 5 Jun 2014 07:33:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jun 2014 07:33:01 +0000 Date: Thu, 5 Jun 2014 07:33:01 +0000 (UTC) From: "stanley shi (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HDFS-6489) DFS Used space is not correct if there're many append operations MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 stanley shi created HDFS-6489: --------------------------------- Summary: DFS Used space is not correct if there're many append operations Key: HDFS-6489 URL: https://issues.apache.org/jira/browse/HDFS-6489 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: stanley shi The current implementation of the Datanode will increase the DFS used space on each block write operation. This is correct in most scenario (create new file), but sometimes it will behave in-correct(append small data to a large block). For example, I have a file with only one block(say, 60M). Then I try to append to it very frequently but each time I append only 10 bytes; Then on each append, dfs used will be increased with the length of the block(60M), not teh actual data length(10bytes). Consider in a scenario I use many clients to append concurrently to a large number of files (1000+), assume the block size is 32M (half of the default value), then the dfs used will be increased 1000*32M = 32G on each append to the files; but actually I only write 10K bytes; this will cause the datanode to report in-sufficient disk space on data write. {quote}2014-06-04 15:27:34,719 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received exception org.apach e.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for appending to Fin alizedReplica, blk_1073742834_45306, FINALIZED{quote} -- This message was sent by Atlassian JIRA (v6.2#6252)