hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmed Abdeen Hamed <ahmed.elma...@gmail.com>
Subject Re: distributing a time consuming single reduce task
Date Tue, 24 Jan 2012 02:49:35 GMT
Thanks very much Steve!

The clustering part of the code is really a blackbox and there isn't much
to do as far as restructuring. I ended up breaking the big input file into
smaller ones and I am letting it running on the cluster. I will know in the
morning if it successfully or not. But, I will consider using Mahout for
clustering since it is built-in with the mapreduce. I will let you know how
that goes if you are interested.

Thanks very much once again for your kind responses!
-Ahmed


On Mon, Jan 23, 2012 at 9:09 PM, Steve Lewis <lordjoe2000@gmail.com> wrote:

>  It sounds like the  HierarchicalClusterer  whatever that is is doing what
> a collection of reducers should be doing - try to restructure the job so
> that the clustering is done more in the sort step allowing the reducer to
> simply collect clusters - the cluster method needs to be
> rearchitected to lean more heavily on map-reduce
>

Mime
View raw message