pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1294) Generated Execution Code Prototyping
Date Fri, 12 Mar 2010 19:45:27 GMT
Generated Execution Code Prototyping

                 Key: PIG-1294
                 URL: https://issues.apache.org/jira/browse/PIG-1294
             Project: Pig
          Issue Type: Bug
          Components: impl
            Reporter: Daniel Dai

Currently Pig has a set of Physical Operators that contain the logic to execute Pig programs.
To execute a given program a pipeline of these physical operators is constructed, split into
Map Reduce jobs, and shipped to Hadoop. We need to investigate changing the physical operators
to instead understand how to generate Java code. Pig can then generate Java code, compile
it, and pass that to Hadoop. Some sources we have read suggest that a significant performance
improvement could be gained. Also this would allow Pig to use pre-compiled tuples specific
to a given script, which should improve memory usage and performance. This would make the
code more complex to develop and maintain. It would also make is more complex to install as
it would require a Java compiler as part of the Pig deployment. 

We mark this issue to be a candidate project for "Google summer of code 2010" program. We
are looking for students we would like to prototype this execution model.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message