pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Trivial Update of "ProposedProjects" by OlgaN
Date Thu, 07 May 2009 18:21:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/ProposedProjects

------------------------------------------------------------------------------
  || Execution || Pig currently executes scripts by building a pipeline of pre-built operators
and running data through those operators in map reduce jobs.  We need to investigate instead
have Pig generate java code specific to a job, and then compiling that code and using it to
run the map reduce jobs. || || || Many conference attendees || gates ||
  || Language || Currently only DISTINCT, ORDER BY, and FILTER are allowed inside FOREACH.
 All operators should be allowed in FOREACH. (Limit is being worked on [https://issues.apache.org/jira/browse/PIG-741
741] || || || gates || ||
  || Optimization || Speed up comparison of tuples during shuffle for ORDER BY || [https://issues.apache.org/jira/browse/PIG-659
659] || || olgan || ||
- || Optimization || Order by should be changed to not use POPackage to put all of the tuples
in a bag on the reduce side, as the bag is just immediately flattened.  It can instead work
like join does for the last input in the join. || || || gates || ||
+ || Optimization || Order by should be changed to not use POPackage to put all of the tuples
in a bag on the reduce side, as the bag is just immediately flattened.  It can instead work
like join does for the last input in the join. || [https://issues.apache.org/jira/browse/PIG-802
802] || || gates || olgan ||
  || Optimization || Often in a Pig script that produces a chain of MR jobs, the map phases
of 2nd and subsequent jobs very little.  What little they do should be pushed into the proceeding
reduce and the map replaced by the identity mapper.  Initial tests showed that the identity
mapper was 50% faster than using a Pig mapper (because Pig uses the loader to parse out tuples
even if the map itself is empty). || [https://issues.apache.org/jira/browse/PIG-480 480] ||
|| olgan || gates ||
  || Optimization || Use hand crafted calls to do string to integer or float conversions.
 Initial tests showed these could be done about 8x faster than String.toIntger() and String.toFloat().
|| [https://issues.apache.org/jira/browse/PIG-482 482] || || olgan || gates ||
  || Optimization || Currently Pig always samples for and ORDER BY to determine how to partition,
and then runs another job to do the sort.  For small enough inputs, it should just sort with
a single reducer. || [https://issues.apache.org/jira/browse/PIG-483 483] || || olgan || ||

Mime
View raw message