hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aseem Anand <aseem.ii...@gmail.com>
Subject Re: Ignore keys while scheduling reduce jobs
Date Fri, 14 Sep 2012 11:26:05 GMT
Consider it to be a single iteration Kmeans clustering job such that I only
wish to schedule reduce jobs for the clusterId(the key for a Kmeans) of the
cluster corresponding to the 1st point in the dataset.
I wish to check the clusterId of the first point in the input file and get
reduce jobs only for that specific clusterId.

I think we shall have to wait for all mappers to end.


On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
> When do you know the keys to ignore ? You mentioned "after the map stage"
> .. is this at the end of each map task, or at the end of all map tasks ?
> Thanks
> hemanth
> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <aseem.iiith@gmail.com>wrote:
>> Hi,
>> Is there anyway I can ignore all keys except a certain key ( determined
>> after the map stage) to start only 1 reduce job using a partitioner? If so
>> could someone suggest such a method.
>> Regards,
>> Aseem

View raw message