hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Ailo Bongo (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-7199) Checksum file check
Date Fri, 18 Mar 2011 18:04:29 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Ailo Bongo updated HADOOP-7199:
------------------------------------

    Description: 
copyFromLocalFile crashes if a cheksum file exists on the local filesystem and the checksum
does not match the file content. This will for example crash "hadoop -fs put ./foo ./foo"
with a non-descriptive error.

It is therefore not possible to do:

1. copyToLocalFile(hdfsFile, localFile)       // creates checksum file
2. modify localFile
3. copyFromLocalFile(localFile, hdfsFile)  // uses old checksum

Solution: do not reuse checksum files, or add a parameter  to copyFromLocalFile that specifies
that checksum files should not be reused.

  was:
I have a unit test that includes writing a file to HDFS using copyFromLocalFile. Sometimes
the function fails due to a checksum error. Once the issue has occurred "hadoop -put <filename>
<anywhere>" also fails as long as the filename is the same as used in the unit test.
The error is due to the file content never being sent to the DataNode, hence the file is size
zero. 

The error is not due to the file content. The error does not depend on the HDFS destination
name. Restarting the NameNode and DataNode does not resolve the issue. I have not been able
to reproduce the error with a simple program. I have also not tested the issue in distributed
or standalone mode.

The only "fix" is to change the source filename.

Below is error and the NameNode log. There is no entry for this operation in the DataNode
log.

/home/larsab/troilkatt2/test-tmp/data>hadoop fs -put status-test.txt status-test.txt3
11/03/18 16:59:54 INFO fs.FSInputChecker: Found checksum error: b[512, 968]=3a646f6e650a323a7365636f6e6453746167653a73746172740a323a7365636f6e6453746167653a646f6e650a323a746869726453746167653a73746172740a323a746869726453746167653a646f6e650a323a74686553696e6b3a73746172740a323a74686553696e6b3a646f6e650a323a54726f696c6b6174743a646f6e650a333a54726f696c6b6174743a73746172740a333a746865536f757263653a73746172740a333a746865536f757263653a646f6e650a333a666972737453746167653a73746172740a333a666972737453746167653a646f6e650a333a7365636f6e6453746167653a73746172740a333a7365636f6e6453746167653a646f6e650a333a746869726453746167653a73746172740a333a746869726453746167653a646f6e650a333a74686553696e6b3a73746172740a333a74686553696e6b3a646f6e650a333a54726f696c6b6174743a646f6e650a343a54726f696c6b6174743a73746172740a343a746865536f757263653a73746172740a343a746865536f757263653a646f6e650a343a666972737453746167653a73746172740a343a666972737453746167653a646f6e650a343a7365636f6e6453746167653a7265636f7665720a
org.apache.hadoop.fs.ChecksumException: Checksum error: status-test.txt at 512
	at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
	at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
	at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
	at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
	at java.io.DataInputStream.read(DataInputStream.java:83)
	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:49)
	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:224)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:170)
	at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1283)
	at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:134)
	at org.apache.hadoop.fs.FsShell.run(FsShell.java:1817)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at org.apache.hadoop.fs.FsShell.main(FsShell.java:1960)
put: Checksum error: status-test.txt at 512

NAMENODE
2011-03-18 16:59:54,422 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 13 Total time for transactions(ms): 1Number of transactions batched in Syncs:
0 Number of syncs: 7 SyncTimes(ms): 220 
2011-03-18 16:59:54,444 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=larsab
ip=/127.0.0.1	cmd=create	src=/user/larsab/status-test.txt3	dst=null	perm=larsab:supergroup:rw-r--r--
2011-03-18 16:59:54,469 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on  file /user/larsab/status-test.txt3
from client DFSClient_-1004170418
2011-03-18 16:59:54,469 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile:
file /user/larsab/status-test.txt3 is closed by DFSClient_-1004170418

        Summary: Checksum file check   (was: fs -put crash that depends on source file name)

> Checksum file check 
> --------------------
>
>                 Key: HADOOP-7199
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7199
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.20.2
>         Environment: Cloudera CDH3B4 in pseudo mode on a Linux 2.6.32-28-generic #55-Ubuntu
SMP x86_64 kernel, with Java HotSpot64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Lars Ailo Bongo
>            Priority: Minor
>
> copyFromLocalFile crashes if a cheksum file exists on the local filesystem and the checksum
does not match the file content. This will for example crash "hadoop -fs put ./foo ./foo"
with a non-descriptive error.
> It is therefore not possible to do:
> 1. copyToLocalFile(hdfsFile, localFile)       // creates checksum file
> 2. modify localFile
> 3. copyFromLocalFile(localFile, hdfsFile)  // uses old checksum
> Solution: do not reuse checksum files, or add a parameter  to copyFromLocalFile that
specifies that checksum files should not be reused.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message