hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Ailo Bongo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1768) fs -put crash that depends on source file name
Date Fri, 18 Mar 2011 17:34:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008532#comment-13008532
] 

Lars Ailo Bongo commented on HDFS-1768:
---------------------------------------

In case my previous comment was unclear. I believe the following caused the error:
1. I did a copyFromLocalFile that crashed after creating the checksum file, but before deleting
the file
2. The content of stats-test.txt was changed such that the new checksum does not match the
checksum in the old non-deleted checksum file.
3. Subsequent copyFromLocalFile uses the old checksum file

Something related happens if the checksum file is invalid, as in:

/home/larsab/troilkatt2/test-tmp/data>cat > .status-test.txt.crc
dsds
dsdsdsd
/home/larsab/troilkatt2/test-tmp/data>hadoop fs -put status-test.txt foo7
11/03/18 18:28:00 WARN fs.FSInputChecker: Problem opening checksum file: status-test.txt.
 Ignoring exception: java.io.IOException: Not a checksum file: .status-test.txt.crc
	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137)
	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:222)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:170)
	at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1283)
	at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:134)
	at org.apache.hadoop.fs.FsShell.run(FsShell.java:1817)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at org.apache.hadoop.fs.FsShell.main(FsShell.java:1960)


> fs -put crash that depends on source file name
> ----------------------------------------------
>
>                 Key: HDFS-1768
>                 URL: https://issues.apache.org/jira/browse/HDFS-1768
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 0.20.2
>         Environment: Cloudera CDH3B4 in pseudo mode on a Linux 2.6.32-28-generic #55-Ubuntu
SMP x86_64 kernel, with Java HotSpot64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Lars Ailo Bongo
>            Priority: Minor
>
> I have a unit test that includes writing a file to HDFS using copyFromLocalFile. Sometimes
the function fails due to a checksum error. Once the issue has occurred "hadoop -put <filename>
<anywhere>" also fails as long as the filename is the same as used in the unit test.
The error is due to the file content never being sent to the DataNode, hence the file is size
zero. 
> The error is not due to the file content. The error does not depend on the HDFS destination
name. Restarting the NameNode and DataNode does not resolve the issue. I have not been able
to reproduce the error with a simple program. I have also not tested the issue in distributed
or standalone mode.
> The only "fix" is to change the source filename.
> Below is error and the NameNode log. There is no entry for this operation in the DataNode
log.
> /home/larsab/troilkatt2/test-tmp/data>hadoop fs -put status-test.txt status-test.txt3
> 11/03/18 16:59:54 INFO fs.FSInputChecker: Found checksum error: b[512, 968]=3a646f6e650a323a7365636f6e6453746167653a73746172740a323a7365636f6e6453746167653a646f6e650a323a746869726453746167653a73746172740a323a746869726453746167653a646f6e650a323a74686553696e6b3a73746172740a323a74686553696e6b3a646f6e650a323a54726f696c6b6174743a646f6e650a333a54726f696c6b6174743a73746172740a333a746865536f757263653a73746172740a333a746865536f757263653a646f6e650a333a666972737453746167653a73746172740a333a666972737453746167653a646f6e650a333a7365636f6e6453746167653a73746172740a333a7365636f6e6453746167653a646f6e650a333a746869726453746167653a73746172740a333a746869726453746167653a646f6e650a333a74686553696e6b3a73746172740a333a74686553696e6b3a646f6e650a333a54726f696c6b6174743a646f6e650a343a54726f696c6b6174743a73746172740a343a746865536f757263653a73746172740a343a746865536f757263653a646f6e650a343a666972737453746167653a73746172740a343a666972737453746167653a646f6e650a343a7365636f6e6453746167653a7265636f7665720a
> org.apache.hadoop.fs.ChecksumException: Checksum error: status-test.txt at 512
> 	at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
> 	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
> 	at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
> 	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
> 	at java.io.DataInputStream.read(DataInputStream.java:83)
> 	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:49)
> 	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
> 	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:224)
> 	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:170)
> 	at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1283)
> 	at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:134)
> 	at org.apache.hadoop.fs.FsShell.run(FsShell.java:1817)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.fs.FsShell.main(FsShell.java:1960)
> put: Checksum error: status-test.txt at 512
> NAMENODE
> 2011-03-18 16:59:54,422 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number
of transactions: 13 Total time for transactions(ms): 1Number of transactions batched in Syncs:
0 Number of syncs: 7 SyncTimes(ms): 220 
> 2011-03-18 16:59:54,444 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=larsab	ip=/127.0.0.1	cmd=create	src=/user/larsab/status-test.txt3	dst=null	perm=larsab:supergroup:rw-r--r--
> 2011-03-18 16:59:54,469 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on  file
/user/larsab/status-test.txt3 from client DFSClient_-1004170418
> 2011-03-18 16:59:54,469 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile:
file /user/larsab/status-test.txt3 is closed by DFSClient_-1004170418

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message