hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Lee (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2080) ChecksumFileSystem checksum file size incorrect.
Date Mon, 22 Oct 2007 18:06:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536772
] 

Richard Lee commented on HADOOP-2080:
-------------------------------------

I have a +2 in my calculation because there's another int that's written right after the header
in the checksum file at ChecksumFileSystem:306.  That int is the BytesPerSum value.  I guess
it could be more explicit by adding a separate term of 4 instead of adding a +2.

As to whether the size should be +1 or -1 when being divided by bytesPerSum... i don't think
it matters since we +1 the result.

So maybe the final computation should be:

(((size-1)/bytesPerSum) + 1) * 4 + CHECKSUM_VERSION.length + 4

> ChecksumFileSystem checksum file size incorrect.
> ------------------------------------------------
>
>                 Key: HADOOP-2080
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2080
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0, 0.14.1, 0.14.2
>         Environment: Sun jdk1.6.0_02 running on Linux CentOS 5
>            Reporter: Richard Lee
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: ChecksumFileSystem.java.patch, TestInternalFilesystem.java
>
>
> Periodically, reduce tasks hang. When the log for the task is consulted, you see a stacktrace
that looks like this:
> 2007-10-18 17:02:04,227 WARN org.apache.hadoop.mapred.ReduceTask: java.io.IOException:
Insufficient space
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryOutputStream.write(InMemoryFileSystem.java:174)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:39)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:326)
> 	at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:140)
> 	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:122)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:310)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
> 	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
> 	at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:253)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:685)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:637)
> The problem stems from a miscalculation of the checksum file created in the InMemoryFileSystem
associated with the data being copied from a completed mapper task to the reducer task.
> The method used for calculating checksum file size is the following (ChecksumFileSystem:318):
> ((long)(Math.ceil((float)size/bytesPerSum)) + 1) * 4 + CHECKSUM_VERSION.length;
> The issue here is the cast to float.  Floating point numbers have only 24 bits of precision,
thus will return short values on any size over 0x1000000.  The fix is to replace this calculation
with something that doesn't cast to float.
> (((size+1)/bytesPerSum) + 2) * 4 + CHECKSUM_VERSION.length

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message