Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 36616 invoked from network); 9 Nov 2010 03:00:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Nov 2010 03:00:05 -0000 Received: (qmail 7408 invoked by uid 500); 9 Nov 2010 03:00:36 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 7296 invoked by uid 500); 9 Nov 2010 03:00:35 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 7287 invoked by uid 99); 9 Nov 2010 03:00:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 03:00:34 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 03:00:31 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oA930A7C002878 for ; Tue, 9 Nov 2010 03:00:10 GMT Message-ID: <984113.93821289271610052.JavaMail.jira@thor> Date: Mon, 8 Nov 2010 22:00:10 -0500 (EST) From: "Ruan Weichao (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1377) Quota bug for partial blocks allows quotas to be violated In-Reply-To: <31342559.33441283711673363.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929908#action_12929908 ] Ruan Weichao commented on HDFS-1377: ------------------------------------ So what is the original idea of the code to be removed in your patch? > Quota bug for partial blocks allows quotas to be violated > ---------------------------------------------------------- > > Key: HDFS-1377 > URL: https://issues.apache.org/jira/browse/HDFS-1377 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.20.1, 0.20.2, 0.21.0 > Reporter: Eli Collins > Assignee: Eli Collins > Priority: Blocker > Fix For: 0.20.3, 0.21.1, 0.22.0 > > Attachments: HDFS-1377.patch > > > There's a bug in the quota code that causes them not to be respected when a file is not an exact multiple of the block size. Here's an example: > {code} > $ hadoop fs -mkdir /test > $ hadoop dfsadmin -setSpaceQuota 384M /test > $ ls dir/ | wc -l # dir contains 101 files > 101 > $ du -ms dir # each is 3mb > 304 dir > $ hadoop fs -put dir /test > $ hadoop fs -count -q /test > none inf 402653184 -550502400 2 101 317718528 hdfs://haus01.sf.cloudera.com:10020/test > $ hadoop fs -stat "%o %r" /test/dir/f30 > 134217728 3 # three 128mb blocks > {code} > INodeDirectoryWithQuota caches the number of bytes consumed by it's children in {{diskspace}}. The quota adjustment code has a bug that causes {{diskspace}} to get updated incorrectly when a file is not an exact multiple of the block size (the value ends up being negative). > This causes the quota checking code to think that the files in the directory consumes less space than they actually do, so the verifyQuota does not throw a QuotaExceededException even when the directory is over quota. However the bug isn't visible to users because {{fs count -q}} reports the numbers generated by INode#getContentSummary which adds up the sizes of the blocks rather than use the cached INodeDirectoryWithQuota#diskspace value. > In FSDirectory#addBlock the disk space consumed is set conservatively to the full block size * the number of replicas: > {code} > updateCount(inodes, inodes.length-1, 0, > fileNode.getPreferredBlockSize()*fileNode.getReplication(), true); > {code} > In FSNameSystem#addStoredBlock we adjust for this conservative estimate by subtracting out the difference between the conservative estimate and what the number of bytes actually stored was: > {code} > //Updated space consumed if required. > INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null; > long diff = (file == null) ? 0 : > (file.getPreferredBlockSize() - storedBlock.getNumBytes()); > if (diff > 0 && file.isUnderConstruction() && > cursize < storedBlock.getNumBytes()) { > ... > dir.updateSpaceConsumed(path, 0, -diff*file.getReplication()); > {code} > We do the same in FSDirectory#replaceNode when completing the file, but at a file granularity (I believe the intent here is to correct for the cases when there's a failure replicating blocks and recovery). Since oldnode is under construction INodeFile#diskspaceConsumed will use the preferred block size (vs of Block#getNumBytes used by newnode) so we will again subtract out the difference between the full block size and what the number of bytes actually stored was: > {code} > long dsOld = oldnode.diskspaceConsumed(); > ... > //check if disk space needs to be updated. > long dsNew = 0; > if (updateDiskspace && (dsNew = newnode.diskspaceConsumed()) != dsOld) { > try { > updateSpaceConsumed(path, 0, dsNew-dsOld); > ... > {code} > So in the above example we started with diskspace at 384mb (3 * 128mb) and then we subtract 375mb (to reflect only 9mb raw was actually used) twice so for each file the diskspace for the directory is - 366mb (384mb minus 2 * 375mb). Which is why the quota gets negative and yet we can still write more files. > So a directory with lots of single block files (if you have multiple blocks on the final partial block ends up subtracting from the diskspace used) ends up having a quota that's way off. > I think the fix is to in FSDirectory#replaceNode not have the diskspaceConsumed calculations differ when the old and new INode have the same blocks. I'll work on a patch which also adds a quota test for blocks that are not multiples of the block size and warns in INodeDirectory#computeContentSummary if the computed size does not reflect the cached value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.