hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8761) Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes)
Date Tue, 04 Sep 2012 22:07:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448102#comment-13448102

Andy Isaacson commented on HADOOP-8761:

bq. True, it's incompatible yet correct behavior. I guess our options are codify the bad behavior
forever, or fix it now.

Since this is client-side code, we could do something like
# in 2.1.0, add %s and leave %b giving bytes, with a stderr message that %b will change in
a future release.
# in 2.2.0, change %b to give blocks and remove the stderr message.

bq. True, it's incompatible yet correct behavior.

I'm a little skeptical that "correct" with respect to POSIX stat(2) is significant here. The
"blocks" used in {{struct stat}} are very different from the "blocks" in HDFS; POSIX blocks
are fixed size, atomically written [1], and the block size is a legacy feature (hardcoded
to 512 bytes which is not the underlying size of anything anymore). By contrast HDFS blocks
are variable size, nonatomic, and are much larger than POSIX blocks.

I agree that exposing the blocksize (and the blockcount) is a pretty valuable feature, but
there's a lot of caveats.  Just off the top of my head: a single file can have multiple blocksizes.

[1] blocks aren't guaranteed to be atomic by the POSIX spec AFAIK, but as a practical matter
modern implementations are atomic at some blocksize between 512 and 4096 bytes.
> Help for FsShell's Stat incorrectly mentions "file size in blocks" (should be bytes)
> ------------------------------------------------------------------------------------
>                 Key: HADOOP-8761
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8761
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.0.0-alpha
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: HADOOP-8761.patch.txt
> Trivial patch attached corrects the usage information.  Stat.java calls FileStatus.getLen(),
which is most definitely the file size in bytes.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message