pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "PigMultiQueryPerformanceSpecification" by RichardDing
Date Tue, 05 May 2009 23:03:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by RichardDing:
http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification

------------------------------------------------------------------------------
     * The parallelism of the merged splitter job is the maximum of the parallelisms of all
splittee jobs.
     * The keys from inner plans are partitioned into all the buckets via the default hash
partitioner.
  
+ This scheme has advantages: 
+ 
+    * Simplicity. No new partition class needed.
+    * Performance. The parallelism of a job specified by users most likely is determined
by the number of available reducers (machines), so the merged parallelism confirms to the
user expectation.   
+ 
  To avoid the key collision of different inner plans with this scheme, the PigNullableWritable
class is modified to take into account of the indexes when two keys are compared (hashed).

    
  [[Anchor(Local_Execution_engine)]]

Mime
View raw message