pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "PigMultiQueryPerformanceSpecification" by RichardDing
Date Tue, 05 May 2009 23:03:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by RichardDing:

     * The parallelism of the merged splitter job is the maximum of the parallelisms of all
splittee jobs.
     * The keys from inner plans are partitioned into all the buckets via the default hash
+ This scheme has advantages: 
+    * Simplicity. No new partition class needed.
+    * Performance. The parallelism of a job specified by users most likely is determined
by the number of available reducers (machines), so the merged parallelism confirms to the
user expectation.   
  To avoid the key collision of different inner plans with this scheme, the PigNullableWritable
class is modified to take into account of the indexes when two keys are compared (hashed).


View raw message