hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1395) Mapside cogroup runs out of memory
Date Mon, 26 Apr 2010 22:43:32 GMT

     [ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ashutosh Chauhan updated PIG-1395:

    Attachment: cogrp_mem.patch

While doing cogroup, we first put tuples from all the relations in a heap, then we drain the
heap and generate the output tuple as appropriate. We need to look ahead atleast one tuple
from all the relations before generating an output tuple to be sure that we have all the tuples
belonging to a key. Currently, we look too far ahead and tuples starts to accumulate faster
in heap then we are draining. At a certain point, we had enough information to generate output
tuple instead of waiting and putting another tuple in heap. This patch generate the output
tuple at that point.

> Mapside cogroup runs out of memory
> ----------------------------------
>                 Key: PIG-1395
>                 URL: https://issues.apache.org/jira/browse/PIG-1395
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>         Attachments: cogrp_mem.patch
> In a particular scenario when there aren't lot of tuples with a same key in a relation
(i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message