hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind Vaidya <kava...@gmail.com>
Subject Checksum Exception : Why is it happening and how to avoid it ?
Date Wed, 10 Aug 2016 13:16:46 GMT
I am trying to upload file to s3.

Locally the file is generated using :
*org.apache.hadoop.io.compress.GzipCodec*
The corresponding .crc file is generated too.

There are 2 scenarios when the file is read and hence the exception

1. While uploading
2. While trimming (extracting only required part from a bigger file)

I avoided the exception in case 1, as I bypass usage of hadoop libraries
altogether, which I could not in case of scenario 2. The exception trace is
as follows

*Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error:
/home/user/prod_binaries_1/message_logs/backup/21738_20/dt=20160807/hour=02/hostname-error_log-20160807-02-00.gz
at 2096128*

*        at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:254)*

*        at
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276)*

*        at
org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:228)*

*        at
org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196)*

*        at java.io.DataInputStream.read(DataInputStream.java:149)*

*        at
org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:159)*

*        at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:143)*

*        at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)*

*        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)*

*        at java.io.BufferedInputStream.read(BufferedInputStream.java:254)*

Hadoop Native lib version 2.7.0

This does not happen all the time. Encountered when the load is more and
file is in MBs. When tested on qa or staging env, where load is less, it
works fine.

What is going wrong here ?

Mime
View raw message