pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "UserDefinedOrdering" by AlanGates
Date Sat, 10 Nov 2007 00:52:22 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by AlanGates:

  `getComparator` that returns a `java.util.Comparator` object.  Currently
  `EvalSpec` hard wires the return function for this.
- A new subclass of `ProjectSpec` will be created `SortProjectSpec`.  This class
- will look like:
+ `EvalSpec` will be changed to include a new member `Comparator<Tuple> comparator`
+ and a new method `void setComparator(Comparator<Tuple> comparator)`.  The
+ existing anonymous class currently returned by `EvalSpec.getComparator()` will
+ be retained, and will be the default value for `EvalSpec.comparator`.  If the
+ user specifies a comparator in the ORDER BY clause, then the parser will call
+ `EvalSpec.setComparator` on the `ProjectSpec` for the
+ associated `SortDistinctSpec`.
+ These changes will handle nested sorts (that is, those inside a foreach).
- {{{
- class SortProjectSpec extends ProjectSpec {
+ For top level sorts, a few more changes need to be made:
- ProjectSpec(List<Integer> cols, Comparator<Tuple> comparator)
- {
- 	super(cols);
- 	mComparator = comparator;
- }
+ First, `SortPartitioner`, which is used to determine which partition a key is
+ in, needs to change to use the designated comparator rather than using the
+ default.  In the function `getPartition` it calls `Arrays.binarySearch(Object, Object)`
+ This needs to be changed to use the templated version that accepts a
+ comparator, `Arrays.binarySearch<T>(T, T, Comparator<T>)`.  The information
+ which comparator to use will be stored in the `JobConf` passed to `configure`
+ by calling `getOutputKeyComparatorClass`.
- @Override
- public Comparator<Tuple> getComparator()
- {
-    return new Comparator<Tuple>() {
-        public int compare(Tuple t1, Tuple t2) {
-     		return mComparator.compare(simpleEval(t1), simpleEval(t2));
-        }
-    };
- }
+ A member will need to be added to `POMapReduce`, `userComparator` of type `Comparator`.
- private Comparator<Tuple> mComparator;
- }
- }}}
+ In `MapReduceLauncher.launchPig`, a check needs to be made to see if
+ `POMapReduce.userComparator` is not null.  If so, then in addition to calling
+ `JobConf.setOutputKeyClass` it will also need to call
+ `JobConf.setOutputKeyComparatorClass` and provide the user specified
+ comparator ('''NOTE''':  Need to check with hadoop people and make sure I have
+ this right.)
- If I understand things correctly this will handle both hooking the comparator
- into the pipeline and making sure that the sort keys are passed to the
- comparator (as they will be what is in the projection list for the
- `ProjectSpec`).  Is this correct Utkarsh?
+ `MapreducePlanCompiler.getQuantileJob` will need to change to set
+ `userComparator` for the `POMapReduce` object it constructs for the quantiles
+ job (currently named `quantileJob`).  The method `getSortJob` will need to
+ change in the same way, setting `userComparator` for the `sortJob POMapReduce`
+ object.  Both of these methods can obtain the proper comparator by calling
+ `getSortSpec().getComparator()` on the passed in `loSort` argument.  These
+ functions will need to be careful to check whether `getComparator` is
+ returning the default comparator or a user provided one rather than always
+ taking the result of this and setting `userComparator`.

View raw message