pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size
Date Wed, 20 Mar 2013 17:07:15 GMT

    [ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607860#comment-13607860

Koji Noguchi commented on PIG-3251:

bq. With HADOOP-7823, can we remove Bzip2TextInputFormat and just use PigTextInputFormat?
That'll (almost) have the same effect of my initial patch pig-3251-trunk-v01.patch which takes
to status (2) in my previous comment.  With HADOOP-7823 + HADOOP-6109, then it'll be (3).
Without a doubt, HADOOP-7823 + HADOOP-6109 is the cleanest approach.

> Bzip2TextInputFormat requires double the memory of maximum record size
> ----------------------------------------------------------------------
>                 Key: PIG-3251
>                 URL: https://issues.apache.org/jira/browse/PIG-3251
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>         Attachments: pig-3251-trunk-v01.patch, pig-3251-trunk-v02.patch
> While looking at user's OOM heap dump, noticed that pig's Bzip2TextInputFormat consumes
memory at both
> Bzip2TextInputFormat.buffer (ByteArrayOutputStream) 
> and actual Text that is returned as line.
> For example, when having one record with 160MBytes, buffer was 268MBytes and Text was
> We can probably eliminate one of them.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message