hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <ale...@cloudera.com>
Subject Re: How do I generate a histogram?
Date Mon, 09 May 2011 21:21:49 GMT
Is the approx. distribution of f(value) known in advance?  Or you can sample
and use TotalOrderPartitioner<http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred/lib/TotalOrderPartitioner.html>

On Mon, May 9, 2011 at 2:11 PM, W.P. McNeill <billmcn@gmail.com> wrote:

> Oops, I forgot to describe the full extent of what I'm trying to do.
> Obviously a histogram is just a word count.  However, I'm also trying to
> generate the histogram in order by bins.  I want to be able to use more
> than
> one reducer, so I'll need to do a total ordering. But I wasn't sure of how
> to write a total ordering when the value I'm trying to order on is a
> function of the original data value.  I figured that this kind of
> trickiness
> is what the Aggregate framework was for, so I set about trying to
> understand
> it.
> Right now I'm generating non-total ordered output, then piping the string
> representation through "sort -n".  This works because my histograms are
> small, but I'd like to do it the right way.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message