hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-49) optimize bag usage
Date Wed, 12 Dec 2007 18:36:43 GMT
optimize bag usage

                 Key: PIG-49
                 URL: https://issues.apache.org/jira/browse/PIG-49
             Project: Pig
          Issue Type: Improvement
            Reporter: Olga Natkovich

(1) Currently, we always bring the entire bag into memory even though in most cases we just
need to stream through it. This is very inefficient in terms of memory and CPU usage.
(2) If we are doing multiple computations on the same group, we iterate over the bag that
represents the group several times. This is very inefficient especially for spilled bags.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message