pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3208) [zebra] TFile should not set io.compression.codec.lzo.buffersize
Date Tue, 19 Mar 2013 00:05:16 GMT

    [ https://issues.apache.org/jira/browse/PIG-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605835#comment-13605835
] 

Daniel Dai commented on PIG-3208:
---------------------------------

Ok, I am not objecting to that. My concern is if a fix introduces other issues unintentionally,
nobody would look at it. But I am fine if that happens, we simply revert the patch.
                
> [zebra] TFile should not set io.compression.codec.lzo.buffersize
> ----------------------------------------------------------------
>
>                 Key: PIG-3208
>                 URL: https://issues.apache.org/jira/browse/PIG-3208
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Eugene Koontz
>            Assignee: Eugene Koontz
>         Attachments: PIG-3208.patch
>
>
> In contrib/zebra/src/java/org/apache/hadoop/zebra/tfile/Compression.java, the following
occurs:
> {code}
> conf.setInt("io.compression.codec.lzo.buffersize", 64 * 1024);
> {code}
> This can cause the LZO decompressor, if called within the context of reading TFiles,
to return with an error code when trying to uncompress LZO-compressed data, if the data's
compressed size is too large to fit in 64 * 1024 bytes.
> For example, the Hadoop-LZO code uses a different default value (256 * 1024):
> https://github.com/twitter/hadoop-lzo/blob/master/src/java/com/hadoop/compression/lzo/LzoCodec.java#L185
> This can lead to a case where, if data is compressed with a cluster where the default
{{io.compression.codec.lzo.buffersize}} = 256*1024 is used, then code that tries to read this
data by using Pig's zebra, the Mapper will exit with code 134 because the LZO compressor returns
a -4 (which encodes the LZO C library error LZO_E_INPUT_OVERRUN) when trying to uncompress
the data. The stack trace of such a case is shown below:
> {code}
> 2013-02-17 14:47:50,709 INFO com.hadoop.compression.lzo.LzoCodec: Creating stream for
compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with bufferSize: 262144
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.io.compress.CodecPool: Paying back codec:
com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.mapred.MapTask: Finished spill 3
> 2013-02-17 14:47:50,857 INFO org.apache.hadoop.io.compress.CodecPool: Borrowing codec:
com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,866 INFO com.hadoop.compression.lzo.LzoCodec: Creating stream for
compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with bufferSize: 262144
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.io.compress.CodecPool: Paying back codec:
com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.mapred.MapTask: Finished spill 4
> 2013-02-17 14:47:50,887 INFO org.apache.hadoop.mapred.Merger: Merging 5 sorted segments
> 2013-02-17 14:47:50,890 INFO org.apache.hadoop.io.compress.CodecPool: Borrowing codec:
com.hadoop.compression.lzo.LzoDecompressor@66a23610
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: calling decompressBytesDirect
with buffer with: position: 0 and limit: 262144
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: read: 245688
bytes from decompressor.
> 2013-02-17 14:47:50,891 INFO org.apache.hadoop.io.compress.CodecPool: Borrowing codec:
com.hadoop.compression.lzo.LzoDecompressor@43684706
> 2013-02-17 14:47:50,892 INFO com.hadoop.compression.lzo.LzoDecompressor: calling decompressBytesDirect
with buffer with: position: 0 and limit: 65536
> 2013-02-17 14:47:50,895 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing
logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-02-17 14:47:50,897 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.InternalError:
lzo1x_decompress returned: -4
>         at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method)
>         at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:307)
>         at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:82)
>         at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
>         at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:341)
>         at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:371)
>         at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:355)
>         at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:387)
>         at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
>         at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:420)
>         at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
>         at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1548)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1180)
>         at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1213)
>         at org.apache.hadoop.mapred.Child.main(Child.java:264)
> {code}
> (Some additional LOG.info statements were added to produce the above output which are
not in this patch).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message