Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E19F52505 for ; Thu, 21 Apr 2011 23:34:51 +0000 (UTC) Received: (qmail 80964 invoked by uid 500); 21 Apr 2011 23:34:51 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 80895 invoked by uid 500); 21 Apr 2011 23:34:51 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 80780 invoked by uid 99); 21 Apr 2011 23:34:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Apr 2011 23:34:51 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Apr 2011 23:34:49 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id A5E89AD9D4 for ; Thu, 21 Apr 2011 23:33:13 +0000 (UTC) Date: Thu, 21 Apr 2011 23:33:13 +0000 (UTC) From: "Hudson (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1717480857.75303.1303428793676.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-1377) Quota bug for partial blocks allows quotas to be violated MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023025#comment-13023025 ] Hudson commented on HDFS-1377: ------------------------------ Integrated in Hadoop-Hdfs-22-branch #35 (See [https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/]) > Quota bug for partial blocks allows quotas to be violated > ---------------------------------------------------------- > > Key: HDFS-1377 > URL: https://issues.apache.org/jira/browse/HDFS-1377 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0, 0.23.0 > Reporter: Eli Collins > Assignee: Eli Collins > Priority: Blocker > Fix For: 0.20.3, 0.21.1, Federation Branch, 0.22.0, 0.23.0 > > Attachments: HDFS-1377.patch, hdfs-1377-1.patch, hdfs-1377-b20-1.patch, hdfs-1377-b20-2.patch, hdfs-1377-b20-3.patch > > > There's a bug in the quota code that causes them not to be respected when a file is not an exact multiple of the block size. Here's an example: > {code} > $ hadoop fs -mkdir /test > $ hadoop dfsadmin -setSpaceQuota 384M /test > $ ls dir/ | wc -l # dir contains 101 files > 101 > $ du -ms dir # each is 3mb > 304 dir > $ hadoop fs -put dir /test > $ hadoop fs -count -q /test > none inf 402653184 -550502400 2 101 317718528 hdfs://haus01.sf.cloudera.com:10020/test > $ hadoop fs -stat "%o %r" /test/dir/f30 > 134217728 3 # three 128mb blocks > {code} > INodeDirectoryWithQuota caches the number of bytes consumed by it's children in {{diskspace}}. The quota adjustment code has a bug that causes {{diskspace}} to get updated incorrectly when a file is not an exact multiple of the block size (the value ends up being negative). > This causes the quota checking code to think that the files in the directory consumes less space than they actually do, so the verifyQuota does not throw a QuotaExceededException even when the directory is over quota. However the bug isn't visible to users because {{fs count -q}} reports the numbers generated by INode#getContentSummary which adds up the sizes of the blocks rather than use the cached INodeDirectoryWithQuota#diskspace value. > In FSDirectory#addBlock the disk space consumed is set conservatively to the full block size * the number of replicas: > {code} > updateCount(inodes, inodes.length-1, 0, > fileNode.getPreferredBlockSize()*fileNode.getReplication(), true); > {code} > In FSNameSystem#addStoredBlock we adjust for this conservative estimate by subtracting out the difference between the conservative estimate and what the number of bytes actually stored was: > {code} > //Updated space consumed if required. > INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null; > long diff = (file == null) ? 0 : > (file.getPreferredBlockSize() - storedBlock.getNumBytes()); > if (diff > 0 && file.isUnderConstruction() && > cursize < storedBlock.getNumBytes()) { > ... > dir.updateSpaceConsumed(path, 0, -diff*file.getReplication()); > {code} > We do the same in FSDirectory#replaceNode when completing the file, but at a file granularity (I believe the intent here is to correct for the cases when there's a failure replicating blocks and recovery). Since oldnode is under construction INodeFile#diskspaceConsumed will use the preferred block size (vs of Block#getNumBytes used by newnode) so we will again subtract out the difference between the full block size and what the number of bytes actually stored was: > {code} > long dsOld = oldnode.diskspaceConsumed(); > ... > //check if disk space needs to be updated. > long dsNew = 0; > if (updateDiskspace && (dsNew = newnode.diskspaceConsumed()) != dsOld) { > try { > updateSpaceConsumed(path, 0, dsNew-dsOld); > ... > {code} > So in the above example we started with diskspace at 384mb (3 * 128mb) and then we subtract 375mb (to reflect only 9mb raw was actually used) twice so for each file the diskspace for the directory is - 366mb (384mb minus 2 * 375mb). Which is why the quota gets negative and yet we can still write more files. > So a directory with lots of single block files (if you have multiple blocks on the final partial block ends up subtracting from the diskspace used) ends up having a quota that's way off. > I think the fix is to in FSDirectory#replaceNode not have the diskspaceConsumed calculations differ when the old and new INode have the same blocks. I'll work on a patch which also adds a quota test for blocks that are not multiples of the block size and warns in INodeDirectory#computeContentSummary if the computed size does not reflect the cached value. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira