hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shevek <she...@karmasphere.com>
Subject Re: Grouping in Combiners
Date Mon, 31 Oct 2011 21:02:00 GMT
On 31 October 2011 13:37, Mathias Herberts <mathias.herberts@gmail.com>wrote:

> I don't know if it's a bug but I'd rather have the ability to set a
> Combiner specific group comparator than to have the Combiner use the group
> comparator set for the Reducer.
> On Oct 31, 2011 9:21 PM, "Harsh J" <harsh@cloudera.com> wrote:
>

Now I'm curious. Can you argue that there's a case where it makes a
difference? Preferably one where it can't be trivially curried into the
combiner?

S.


> > Shevek,
> >
> > The problem Mathias indicates here is that the Combiners do not utilize
> > the Grouping Comparators. They only use the Sort Comparators. Is that
> > probably a bug is what I wonder.
> >
> > On 31-Oct-2011, at 11:14 PM, Shevek wrote:
> >
> > > I like the ability to reuse a Java component for both sorting and
> > grouping,
> > > and to be honest, since the cases where one can do a comparison without
> > > deserializing the raw bytes are relatively few and far between, I tend
> to
> > > use java's Comparator interface, and wrap it in some
> > > infrastructure-specific adapter. I have a vague feeling that Hadoop
> > > sometimes calls the byte interface and sometimes the object interface
> > > anyway? ICBW, the way I've been writing code makes it irrelevant.
> > >
> > > Alternatively, I've misunderstood the (simpler) question, and the
> answer
> > is
> > > to use the setGroupingComparatorClass() API.
> > >
> > > S.
> > >
> > > On 29 October 2011 04:35, Mathias Herberts <mathias.herberts@gmail.com
> > >wrote:
> > >
> > >> Another point concerning the Combiners,
> > >>
> > >> the grouping is currently done using the RawComparator used for
> > >> sorting the Mapper's output. Wouldn't it be useful to be able to set a
> > >> custom CombinerGroupingComparatorClass?
> > >>
> > >> Mathias.
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message