hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed
Date Thu, 17 May 2007 07:35:16 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Arun C Murthy updated HADOOP-1193:

    Attachment: HADOOP-1193_1_20070517.patch

Here is a patch while I continue further testing... Hairong could you try to see if it works
for you? Thanks!

Basically I went ahead and implemented a 'codec pool' to reuse the direct-buffer based codecs
so as to not create too many of them... 

Results while trying to sort 1Million records via TestSequenceFile with RECORD compression:

                                     trunk           H-1193
Compressors:          1382                  3
Decompressors:      1520                 12
Total:                            2902                 15

Results are even more dramatic for BLOCK compression (we need 4 codecs per Reader with BLOCK
compression for key, keyLen, val & valLen) ... in fact I have gone ahead and bumped up
the default direct buffer size for zlib to 64K from 1K which should lead to improved performance
too, on the back of this patch.

Appreciate any review/feedback.

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>         Assigned To: Arun C Murthy
>         Attachments: HADOOP-1193_1_20070517.patch
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to
be compressed. But it worked fine with release 0.10.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message