hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Bockelman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4298) File corruption when reading with fuse-dfs
Date Tue, 30 Sep 2008 20:55:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635853#action_12635853
] 

Brian Bockelman commented on HADOOP-4298:
-----------------------------------------

Hey Pete - 

Regarding the comment on "29/Sep/08 11:10 PM": Yes, I misspoke.  I had adjusted -m64 in the
libhdfs makefile.

Regarding the later comment, "30/Sep/08 01:09 PM":  As you mention, FUSE appears to be unhappy
with this behavior - I always get the asked-for number of bytes returned.

Besides, in the unix definition of 'read' (http://www.opengroup.org/onlinepubs/000095399/functions/read.html),
it states, regarding the return value:

"""
This number shall never be greater than nbyte. The value returned may be less than nbyte if
the number of bytes left in the file is less than nbyte, if the read() request was interrupted
by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes
immediately available for reading. For example, a read() from a file associated with a terminal
may return one typed line of data.
"""

To me, that means if there are more bytes in the file, you should give exactly nbytes; I guess
I can see that if you have less than nbytes in the buffer, you could claim it meets the requirement
"fewer than nbyte bytes immediately available for reading".

I guess you still end up being stuck with the FUSE problem.

Regarding exceptionally large reads (> 10MB): it might be a problem, but I can't trigger
it locally.

Finally, I think when my application failed last, I hadn't remounted the FS with the new code.
 I tried it out this morning, and was pleased to see things work all the way through, no segfaults.

Thanks for the help.  This goes far in "selling" the idea of a new distributed file system
to our sysadmins.

Brian

> File corruption when reading with fuse-dfs
> ------------------------------------------
>
>                 Key: HADOOP-4298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4298
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fuse-dfs
>    Affects Versions: 0.18.1
>         Environment: CentOs 4.6 final; kernel 2.6.9-67.ELsmp; FUSE 2.7.4; hadoop 0.18.1;
64-bit
> I hand-altered the fuse-dfs makefile to use 64-bit instead of the hardcoded -m32.
>            Reporter: Brian Bockelman
>            Priority: Critical
>             Fix For: 0.18.2
>
>
> I pulled a 5GB data file into Hadoop using the following command:
> hadoop fs -put /scratch/886B9B3D-6A85-DD11-A9AB-000423D6CA6E.root /user/brian/testfile
> I have HDFS mounted in /mnt/hadoop using fuse-dfs.
> However, when I try to md5sum the file in place (md5sum /mnt/hadoop) or copy the file
back to local disk using "cp" then md5sum it, the checksum is incorrect.
> When I pull the file using normal hadoop means (hadoop fs -get /user/brian/testfile /scratch),
the md5sum is correct.
> When I repeat the test with a smaller file (512MB, on the theory that there is a problem
with some 2GB limit somewhere), the problem remains.
> When I repeat the test, the md5sum is consistently wrong - i.e., some part of the corruption
is deterministic, and not the apparent fault of a bad disk.
> CentOs 4.6 is, unfortunately, not the apparent culprit.  When checking on CentOs 5.x,
I could recreate the corruption issue.  The second node was also a 64-bit compile and CentOs
5.2 (`uname -r` returns 2.6.18-92.1.10.el5).
> Thanks for looking into this,
> Brian

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message