hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kirby Bohling <kirby.bohl...@gmail.com>
Subject TestDU failures on common RedHat distributions
Date Tue, 08 Sep 2009 23:50:55 GMT
All,

   I was trying to get Hadoop compiling and passing unit tests.  I
have having problems with the TestDU test.  I have searched the issue
tracker and the web.  I haven't seen much googling around for issues.

>From the TEST-org.apache.hadoop.fs.TestDU.txt:

Testcase: testDU took 5.147 sec
    FAILED
expected:<32768> but was:<36864>
junit.framework.AssertionFailedError: expected:<32768> but was:<36864>
    at org.apache.hadoop.fs.TestDU.testDU(TestDU.java:79)

Source version:
https://svn.apache.org/repos/asf/hadoop/common/trunk@812652

I found the following issues, but none of them covered the problems I am seeing:
http://issues.apache.org/jira/browse/HADOOP-2813
http://issues.apache.org/jira/browse/HADOOP-2845
http://issues.apache.org/jira/browse/HADOOP-2927

I also saw this e-mail, which I think is misdiagnosing the issue:
http://markmail.org/message/tlmb63rgootn4ays

I think Konstantin has it exactly backwards.  I see this behavior
exactly the same on both a updated CentOS 5.3, and a Fedora 11
install, but tmpfs is doing what Hadoop expects, ext3 is causing unit
tests failures.  A file that is exactly 32K (as created by the unit
test), is 36K according to "du -sk" on an ext3 partition.  On a tmpfs
(in-memory partition), it is 32K as one would "expect".

If I run: ant test -Dtest.build.data=/tmp/hadoop/test, and the du
calls happen on a tmpfs filesystem, all of the tests pass.  If I run
"ant test", and the data files end up inside of
"/home/kbohling/hadoop/build/tmp/data/dutmp/", the unit tests fail.

Running all of these on my Fedora 11 machine:

$ ls -l /tmp/data /home/shared/data
-rw-rw-r--. 1 kbohling shared         32768 2009-09-08 15:34 /home/shared/data
-rw-rw-r--. 1 kbohling kbohling 32768 2009-09-08 15:33 /tmp/data

$ stat /tmp/data /home/shared/data  | egrep "Blocks|File"
  File: `/tmp/data'
  Size: 32768     	Blocks: 64         IO Block: 4096   regular file
  File: `/home/shared/data'
  Size: 32768     	Blocks: 72         IO Block: 4096   regular file

NOTE: the 64 blocks corresponds to a 32K file, the 72 blocks
corresponds to a 36K file.

$ df -h /tmp /home/shared
Filesystem            Size  Used Avail Use% Mounted on
tmpfs                 2.0G  1.9M  2.0G   1% /tmp
/dev/mapper/stdfs-home
                       19G   17G  1.4G  93% /home

$ cat /proc/mounts | egrep "/home|/tmp"
/dev/mapper/stdfs-home /home ext3
rw,relatime,errors=continue,user_xattr,acl,data=ordered 0 0
tmpfs /tmp tmpfs rw,rootcontext=system_u:object_r:tmp_t:s0,relatime 0 0

# dumpe2fs -h /dev/mapper/stdfs-home | grep "Block size"
dumpe2fs 1.41.4 (27-Jan-2009)
Block size:               4096

# dumpe2fs -h /dev/mapper/stdfs-home | grep "Filesystem features"
dumpe2fs 1.41.4 (27-Jan-2009)
Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file


Running just the ext3 filesystem on a CentOS 5.3 machine (the tmpfs is
all the same as Fedora):

# ls -l /root/data
-rw-r--r-- 1 root root 32768 Sep  8 15:54 /root/data

# stat /root/data | egrep "File|Blocks"
  File: `/root/data'
  Size: 32768     	Blocks: 72         IO Block: 4096   regular file

# cat /proc/mounts | grep "root"
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,data=ordered 0 0

# dumpe2fs -h /dev/root  | grep "Block size"
dumpe2fs 1.39 (29-May-2006)
Block size:               4096

# dumpe2fs -h /dev/root  | grep "Filesystem features"
dumpe2fs 1.39 (29-May-2006)
Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file

I'm not sure why this is happening on RedHat's varients of Linux (I
don't have an Ubuntu, SuSE, etc, etc machine handy otherwise I'd check
them also).  It definitely appears that the ext3 filesystem kernel
calls are returning that there are 36K blocks, not 32K blocks.

Okay, a bit more checking, running all of this on Fedora to try a file
system without extended attributes:
# lvcreate --name foo --size 1G /dev/stdfs

# mke2fs -j -O ^ext_attr /dev/stdfs/foo

# mount /dev/stdfs/foo /mnt

# cp /home/shared/data /mnt/.

# stat /mnt/data | egrep "File|Blocks"
  File: `/mnt/data'
  Size: 32768     	Blocks: 64         IO Block: 4096   regular file

NOTE: so now the same file that on /home/shared/ is 36K is now 32K.

So it looks like Hadoop's unit tests will fail if they run on any ext3
filesystem that supports "ext_attr".  It might be nice to note that in
a comment in the code, if it can't be detected at runtime during the
tests.  Sure looks like you could create a 32k, 64k, and 128k file and
deem it "acceptable" if they are all off by exactly 4k or something.
So if they don't match exactly, but they are all off by exactly one
block size, that might be okay.

I can try and boil this down and file a JIRA issue if that is appropriate.

Thanks,
    Kirby

Mime
View raw message