Return-Path: Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: (qmail 8389 invoked from network); 8 Sep 2009 23:51:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Sep 2009 23:51:49 -0000 Received: (qmail 32821 invoked by uid 500); 8 Sep 2009 23:51:48 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 32747 invoked by uid 500); 8 Sep 2009 23:51:48 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 32737 invoked by uid 99); 8 Sep 2009 23:51:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Sep 2009 23:51:48 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kirby.bohling@gmail.com designates 209.85.221.171 as permitted sender) Received: from [209.85.221.171] (HELO mail-qy0-f171.google.com) (209.85.221.171) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Sep 2009 23:51:36 +0000 Received: by qyk1 with SMTP id 1so2963185qyk.22 for ; Tue, 08 Sep 2009 16:51:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:from:date:message-id :subject:to:content-type; bh=iGL+4td5brzp7Oz59hDdA1AP5JP9lWBKZvgpf9TcYRg=; b=oEGsAwSsa+XW7f6dIbcctBFJoholfvpD8ORzM1VwZwwrynTF4BOJS+fNSMS8WQzStZ GkDDKz6DKgS5KPyES6JD8i5vkMFGY6SXlfc5csnvPYPgJTHmcf6maf8R9553lkFQTpgN r3mkD80fS20rLGS323Zjg3/EH1adrXFVoAAKM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=kpPZnAvcrlm7Hq4w3U2CTfkajQvgsilzOBMK9BTz4MSdgRUAhbprhSYWnVbG1A8pgb ifoH/FBMRswip0eVOEIEbqOgXvHB2rAnK/j7qMR9fIclrzXjuJeZ5XGxQ5Jpqhok1jqD u2WH71lLVIKpQrZ52o9szZvEWQjdSG2a5l/P8= MIME-Version: 1.0 Received: by 10.224.102.212 with SMTP id h20mr10551351qao.40.1252453875150; Tue, 08 Sep 2009 16:51:15 -0700 (PDT) From: Kirby Bohling Date: Tue, 8 Sep 2009 18:50:55 -0500 Message-ID: <3cac8fdf0909081650p6f77e169x2b7acec530dbeb58@mail.gmail.com> Subject: TestDU failures on common RedHat distributions To: common-dev@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org All, I was trying to get Hadoop compiling and passing unit tests. I have having problems with the TestDU test. I have searched the issue tracker and the web. I haven't seen much googling around for issues. >From the TEST-org.apache.hadoop.fs.TestDU.txt: Testcase: testDU took 5.147 sec FAILED expected:<32768> but was:<36864> junit.framework.AssertionFailedError: expected:<32768> but was:<36864> at org.apache.hadoop.fs.TestDU.testDU(TestDU.java:79) Source version: https://svn.apache.org/repos/asf/hadoop/common/trunk@812652 I found the following issues, but none of them covered the problems I am seeing: http://issues.apache.org/jira/browse/HADOOP-2813 http://issues.apache.org/jira/browse/HADOOP-2845 http://issues.apache.org/jira/browse/HADOOP-2927 I also saw this e-mail, which I think is misdiagnosing the issue: http://markmail.org/message/tlmb63rgootn4ays I think Konstantin has it exactly backwards. I see this behavior exactly the same on both a updated CentOS 5.3, and a Fedora 11 install, but tmpfs is doing what Hadoop expects, ext3 is causing unit tests failures. A file that is exactly 32K (as created by the unit test), is 36K according to "du -sk" on an ext3 partition. On a tmpfs (in-memory partition), it is 32K as one would "expect". If I run: ant test -Dtest.build.data=/tmp/hadoop/test, and the du calls happen on a tmpfs filesystem, all of the tests pass. If I run "ant test", and the data files end up inside of "/home/kbohling/hadoop/build/tmp/data/dutmp/", the unit tests fail. Running all of these on my Fedora 11 machine: $ ls -l /tmp/data /home/shared/data -rw-rw-r--. 1 kbohling shared 32768 2009-09-08 15:34 /home/shared/data -rw-rw-r--. 1 kbohling kbohling 32768 2009-09-08 15:33 /tmp/data $ stat /tmp/data /home/shared/data | egrep "Blocks|File" File: `/tmp/data' Size: 32768 Blocks: 64 IO Block: 4096 regular file File: `/home/shared/data' Size: 32768 Blocks: 72 IO Block: 4096 regular file NOTE: the 64 blocks corresponds to a 32K file, the 72 blocks corresponds to a 36K file. $ df -h /tmp /home/shared Filesystem Size Used Avail Use% Mounted on tmpfs 2.0G 1.9M 2.0G 1% /tmp /dev/mapper/stdfs-home 19G 17G 1.4G 93% /home $ cat /proc/mounts | egrep "/home|/tmp" /dev/mapper/stdfs-home /home ext3 rw,relatime,errors=continue,user_xattr,acl,data=ordered 0 0 tmpfs /tmp tmpfs rw,rootcontext=system_u:object_r:tmp_t:s0,relatime 0 0 # dumpe2fs -h /dev/mapper/stdfs-home | grep "Block size" dumpe2fs 1.41.4 (27-Jan-2009) Block size: 4096 # dumpe2fs -h /dev/mapper/stdfs-home | grep "Filesystem features" dumpe2fs 1.41.4 (27-Jan-2009) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Running just the ext3 filesystem on a CentOS 5.3 machine (the tmpfs is all the same as Fedora): # ls -l /root/data -rw-r--r-- 1 root root 32768 Sep 8 15:54 /root/data # stat /root/data | egrep "File|Blocks" File: `/root/data' Size: 32768 Blocks: 72 IO Block: 4096 regular file # cat /proc/mounts | grep "root" rootfs / rootfs rw 0 0 /dev/root / ext3 rw,data=ordered 0 0 # dumpe2fs -h /dev/root | grep "Block size" dumpe2fs 1.39 (29-May-2006) Block size: 4096 # dumpe2fs -h /dev/root | grep "Filesystem features" dumpe2fs 1.39 (29-May-2006) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file I'm not sure why this is happening on RedHat's varients of Linux (I don't have an Ubuntu, SuSE, etc, etc machine handy otherwise I'd check them also). It definitely appears that the ext3 filesystem kernel calls are returning that there are 36K blocks, not 32K blocks. Okay, a bit more checking, running all of this on Fedora to try a file system without extended attributes: # lvcreate --name foo --size 1G /dev/stdfs # mke2fs -j -O ^ext_attr /dev/stdfs/foo # mount /dev/stdfs/foo /mnt # cp /home/shared/data /mnt/. # stat /mnt/data | egrep "File|Blocks" File: `/mnt/data' Size: 32768 Blocks: 64 IO Block: 4096 regular file NOTE: so now the same file that on /home/shared/ is 36K is now 32K. So it looks like Hadoop's unit tests will fail if they run on any ext3 filesystem that supports "ext_attr". It might be nice to note that in a comment in the code, if it can't be detected at runtime during the tests. Sure looks like you could create a 32k, 64k, and 128k file and deem it "acceptable" if they are all off by exactly 4k or something. So if they don't match exactly, but they are all off by exactly one block size, that might be okay. I can try and boil this down and file a JIRA issue if that is appropriate. Thanks, Kirby