pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "PigMultiQueryPerformanceSpecification" by GuntherHagleitner
Date Sat, 07 Feb 2009 20:55:32 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by GuntherHagleitner:
http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification

------------------------------------------------------------------------------
  ===== hadoop 0.19 supports MultipleOutput =====
  Link: http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html#addNamedOutput(org.apache.hadoop.mapred.JobConf,%20java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class)
  
- All the output will still be in the same directory, but the developer can give name for
different sets of output data. So, in our case we might name the output "split1" and "split2"
and the output would come out to be:
+ All the output will still be in the same directory, but the developer can give names for
different sets of output data. So, in our case we might name the output "split1" and "split2"
and the output would come out to be:
  
  {{{
  /outdir/split1-0000
@@ -287, +287 @@

  ===== MRCompiler (Phase 2 and 3) =====
  The MR Compiler right now looks for splits, terminates the MR job at that point and connects
the remaining operators via load and store.
  
- We'll add a new optimizer pass to look for these split scenarios. This gives us the ability
to use the combiner plan information to make the determination of multipexing or not (Phase
3) and also allows us more easily to switch back to the old style handling, if multiple outputs
are not available.
+ We'll add a new optimizer pass to look for these split scenarios. This gives us the ability
to use the combiner plan information to make the determination of multiplexing or not (Phase
3) and also allows us more easily to switch back to the old style handling, if multiple outputs
are not available.
  
  [[Anchor(Parallelism_(Phase_3))]]
  ===== Parallelism (Phase 3) =====

Mime
View raw message