hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1377) Quota bug for partial blocks allows quotas to be violated
Date Tue, 23 Nov 2010 03:28:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934722#action_12934722

Eli Collins commented on HDFS-1377:

Thanks for taking a look Raghu!

bq. Not sure how it manifests in real life, but here a file is going from INodeFile to INodeFIleUnderConstruction.
So the actual space consumed should be rounded upwards to the block boundery. Add stored block
is not called at this time.

The sequence of events is:

* A client starts appending to an existing file
* dfsadmin -saveNamespace  (persists the INodeFileUnderConstruction)
* The NN restarts, when loading the image we replace the old INode with the persisted INodeFileUnderConstruction.

Since disk usage is not persisted in the image, is calculated as DNs report blocks on startup,
I don't think this path should update disk usage. 

bq. How about calling replaceNode(..., updateDiskspace = false) in finalizeFileUnderconstruction(),
since we know that oldNode is not under construction anymore?

I think we can remove the disk updating code there entirely, even on trunk, and add tests
to make sure the block-level accounting is working.

bq. btw, do we separate test file for this? TestQuota.java is already supposed to test violations
(FWIW, it does test some violations).

I had the put them in a separate class so they could share setup and teardown (vs TestQuota
which has setup/teardown code for each test). I'll update the patches to add the tests to
TestQuota so they're all in one place.

> Quota bug for partial blocks allows quotas to be violated 
> ----------------------------------------------------------
>                 Key: HDFS-1377
>                 URL: https://issues.apache.org/jira/browse/HDFS-1377
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0, 0.23.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0, 0.23.0
>         Attachments: hdfs-1377-b20-1.patch, hdfs-1377-b20-2.patch, HDFS-1377.patch
> There's a bug in the quota code that causes them not to be respected when a file is not
an exact multiple of the block size. Here's an example:
> {code}
> $ hadoop fs -mkdir /test
> $ hadoop dfsadmin -setSpaceQuota 384M /test
> $ ls dir/ | wc -l   # dir contains 101 files
> 101
> $ du -ms dir        # each is 3mb
> 304	dir
> $ hadoop fs -put dir /test
> $ hadoop fs -count -q /test
>         none             inf       402653184      -550502400            2          101
         317718528 hdfs://haus01.sf.cloudera.com:10020/test
> $ hadoop fs -stat "%o %r" /test/dir/f30
> 134217728 3    # three 128mb blocks
> {code}
> INodeDirectoryWithQuota caches the number of bytes consumed by it's children in {{diskspace}}.
The quota adjustment code has a bug that causes {{diskspace}} to get updated incorrectly when
a file is not an exact multiple of the block size (the value ends up being negative). 
> This causes the quota checking code to think that the files in the directory consumes
less space than they actually do, so the verifyQuota does not throw a QuotaExceededException
even when the directory is over quota. However the bug isn't visible to users because {{fs
count -q}} reports the numbers generated by INode#getContentSummary which adds up the sizes
of the blocks rather than use the cached INodeDirectoryWithQuota#diskspace value.
> In FSDirectory#addBlock the disk space consumed is set conservatively to the full block
size * the number of replicas:
> {code}
> updateCount(inodes, inodes.length-1, 0,
>     fileNode.getPreferredBlockSize()*fileNode.getReplication(), true);
> {code}
> In FSNameSystem#addStoredBlock we adjust for this conservative estimate by subtracting
out the difference between the conservative estimate and what the number of bytes actually
stored was:
> {code}
> //Updated space consumed if required.
> INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null;
> long diff = (file == null) ? 0 :
>     (file.getPreferredBlockSize() - storedBlock.getNumBytes());
> if (diff > 0 && file.isUnderConstruction() &&
>     cursize < storedBlock.getNumBytes()) {
> ...
>     dir.updateSpaceConsumed(path, 0, -diff*file.getReplication());
> {code}
> We do the same in FSDirectory#replaceNode when completing the file, but at a file granularity
(I believe the intent here is to correct for the cases when there's a failure replicating
blocks and recovery). Since oldnode is under construction INodeFile#diskspaceConsumed will
use the preferred block size  (vs of Block#getNumBytes used by newnode) so we will again subtract
out the difference between the full block size and what the number of bytes actually stored
> {code}
> long dsOld = oldnode.diskspaceConsumed();
> ...
> //check if disk space needs to be updated.
> long dsNew = 0;
> if (updateDiskspace && (dsNew = newnode.diskspaceConsumed()) != dsOld) {
>   try {
>     updateSpaceConsumed(path, 0, dsNew-dsOld);
> ...
> {code}
> So in the above example we started with diskspace at 384mb (3 * 128mb) and then we subtract
375mb (to reflect only 9mb raw was actually used) twice so for each file the diskspace for
the directory is - 366mb (384mb minus 2 * 375mb). Which is why the quota gets negative and
yet we can still write more files.
> So a directory with lots of single block files (if you have multiple blocks on the final
partial block ends up subtracting from the diskspace used) ends up having a quota that's way
> I think the fix is to in FSDirectory#replaceNode not have the diskspaceConsumed calculations
differ when the old and new INode have the same blocks. I'll work on a patch which also adds
a quota test for blocks that are not multiples of the block size and warns in INodeDirectory#computeContentSummary
if the computed size does not reflect the cached value.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message