hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-844) PERFORMANCE: streaming data to the UDFs in foreach
Date Thu, 11 Jun 2009 23:53:07 GMT
PERFORMANCE: streaming data to the UDFs in foreach
--------------------------------------------------

                 Key: PIG-844
                 URL: https://issues.apache.org/jira/browse/PIG-844
             Project: Pig
          Issue Type: Improvement
            Reporter: Olga Natkovich


Currently, Pig places the data passed to UDFs into a bag. This can cause the process to use
more memory than actually needed as in many cases it would be better to push the data one
tuple at a time to the UDFs.

For the case where combiner is invoked, this might not be that important; however, for non-algebraic
UDFs as well as other cases where combiner can't be used, this can provide significant memory
improvement.

Another possible use case is where the data is already grouped going into pig and we don't
need to group it again.

How this will effect UDF interface needs to be further discussed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message