hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhong Wang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1377) Quota bug for partial blocks allows quotas to be violated
Date Fri, 05 Nov 2010 02:48:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928455#action_12928455
] 

Zhong Wang commented on HDFS-1377:
----------------------------------

I made a link from HDFS-1487. I found the same problem when testing the issue. I think I can
fix this soon and attach a patch. Any comments?

> Quota bug for partial blocks allows quotas to be violated 
> ----------------------------------------------------------
>
>                 Key: HDFS-1377
>                 URL: https://issues.apache.org/jira/browse/HDFS-1377
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1, 0.20.2, 0.21.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>
> There's a bug in the quota code that causes them not to be respected when a file is not
an exact multiple of the block size. Here's an example:
> {code}
> $ hadoop fs -mkdir /test
> $ hadoop dfsadmin -setSpaceQuota 384M /test
> $ ls dir/ | wc -l   # dir contains 101 files
> 101
> $ du -ms dir        # each is 3mb
> 304	dir
> $ hadoop fs -put dir /test
> $ hadoop fs -count -q /test
>         none             inf       402653184      -550502400            2          101
         317718528 hdfs://haus01.sf.cloudera.com:10020/test
> $ hadoop fs -stat "%o %r" /test/dir/f30
> 134217728 3    # three 128mb blocks
> {code}
> INodeDirectoryWithQuota caches the number of bytes consumed by it's children in {{diskspace}}.
The quota adjustment code has a bug that causes {{diskspace}} to get updated incorrectly when
a file is not an exact multiple of the block size (the value ends up being negative). 
> This causes the quota checking code to think that the files in the directory consumes
less space than they actually do, so the verifyQuota does not throw a QuotaExceededException
even when the directory is over quota. However the bug isn't visible to users because {{fs
count -q}} reports the numbers generated by INode#getContentSummary which adds up the sizes
of the blocks rather than use the cached INodeDirectoryWithQuota#diskspace value.
> In FSDirectory#addBlock the disk space consumed is set conservatively to the full block
size * the number of replicas:
> {code}
> updateCount(inodes, inodes.length-1, 0,
>     fileNode.getPreferredBlockSize()*fileNode.getReplication(), true);
> {code}
> In FSNameSystem#addStoredBlock we adjust for this conservative estimate by subtracting
out the difference between the conservative estimate and what the number of bytes actually
stored was:
> {code}
> //Updated space consumed if required.
> INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null;
> long diff = (file == null) ? 0 :
>     (file.getPreferredBlockSize() - storedBlock.getNumBytes());
> if (diff > 0 && file.isUnderConstruction() &&
>     cursize < storedBlock.getNumBytes()) {
> ...
>     dir.updateSpaceConsumed(path, 0, -diff*file.getReplication());
> {code}
> We do the same in FSDirectory#replaceNode when completing the file, but at a file granularity
(I believe the intent here is to correct for the cases when there's a failure replicating
blocks and recovery). Since oldnode is under construction INodeFile#diskspaceConsumed will
use the preferred block size  (vs of Block#getNumBytes used by newnode) so we will again subtract
out the difference between the full block size and what the number of bytes actually stored
was:
> {code}
> long dsOld = oldnode.diskspaceConsumed();
> ...
> //check if disk space needs to be updated.
> long dsNew = 0;
> if (updateDiskspace && (dsNew = newnode.diskspaceConsumed()) != dsOld) {
>   try {
>     updateSpaceConsumed(path, 0, dsNew-dsOld);
> ...
> {code}
> So in the above example we started with diskspace at 384mb (3 * 128mb) and then we subtract
375mb (to reflect only 9mb raw was actually used) twice so for each file the diskspace for
the directory is - 366mb (384mb minus 2 * 375mb). Which is why the quota gets negative and
yet we can still write more files.
> So a directory with lots of single block files (if you have multiple blocks on the final
partial block ends up subtracting from the diskspace used) ends up having a quota that's way
off.
> I think the fix is to in FSDirectory#replaceNode not have the diskspaceConsumed calculations
differ when the old and new INode have the same blocks. I'll work on a patch which also adds
a quota test for blocks that are not multiples of the block size and warns in INodeDirectory#computeContentSummary
if the computed size does not reflect the cached value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message