hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-49) optimize bag usage
Date Tue, 20 Jan 2009 22:54:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665624#action_12665624
] 

Alan Gates commented on PIG-49:
-------------------------------

At this point I think there is no plan to fix this.  We have implemented a streaming interface
for cogroup (one of the tables is streamed).  For straight group by queries we are counting
on the fact that most aggregate UDFs are algebraic and can use the combiner, and thus do not
need this.  Unless I see any objections I'll mark this as won't fix.

> optimize bag usage
> ------------------
>
>                 Key: PIG-49
>                 URL: https://issues.apache.org/jira/browse/PIG-49
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> (1) Currently, we always bring the entire bag into memory even though in most cases we
just need to stream through it. This is very inefficient in terms of memory and CPU usage.
> (2) If we are doing multiple computations on the same group, we iterate over the bag
that represents the group several times. This is very inefficient especially for spilled bags.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message