tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TEZ-1634) BlockCompressorStream.finish() is called twice in IFile.close leading to Shuffle errors
Date Wed, 01 Oct 2014 01:05:34 GMT

     [ https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gopal V updated TEZ-1634:
-------------------------
    Affects Version/s: 0.6.0
                       0.5.0

> BlockCompressorStream.finish() is called twice in IFile.close leading to Shuffle errors
> ---------------------------------------------------------------------------------------
>
>                 Key: TEZ-1634
>                 URL: https://issues.apache.org/jira/browse/TEZ-1634
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: BlockCompressorStream.with.logging.java, TEZ-1634.1.patch, TEZ-1634.2.patch,
stacktrace-with-comments.txt
>
>
> When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And as a part
of FSDataOutputStream.close(), it again internally calls finish().  Please refer o.a.h.i.compress.BlockCompressorStream
for more details on finish(). This leads to additional 4 bytes being written to IFile.  This
causes issues randomly during shuffle.  Also, this prevents IFileInputStream to do the proper
checksumming.  
> This error happens only when we try to fetch multiple attempt outputs using the same
URL.  And is easily reproducible with SnappCompressionCodec.  First attempt output would be
downloaded by fetcher and due to the last 4 bytes in the stream, it wouldn't do the proper
checksumming in IFileInputStream.  This causes the subsequent attempt download to fail with
the following exception.
> Example error in shuffle phase is attached below.
> >>>>
> 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] org.apache.tez.runtime.library.common.shuffle.impl.Fetcher:
Invalid map id 
> java.lang.IllegalArgumentException: Invalid header received:  partition: 0
> 	at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
> 	at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
> 	at org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
> >>>>
> I will attach the debug version of BlockCompressionStream with threaddump (which validates
that finish() is called twice in IFile.close()).  This bug was present in earlier versions
of Tez as well, and was able to consistently reproduce it now on local-vm itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message