mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Potter <thelabd...@gmail.com>
Subject Re: Fewer reducers leading to better distributed k-means job performance?
Date Wed, 07 Sep 2011 04:41:12 GMT
Ok, ran the job with 1 reducer and with 16 reducers; results pretty much
confirm Konstantin's point ...

48 reducers = 1 hour
32 reducers = 44 minutes
1 reducer = 47 minutes
16 reducers = 41 minutes

So there is a little benefit to using more than one reducer, but not much
;-)


On Tue, Sep 6, 2011 at 5:47 PM, Timothy Potter <thelabdude@gmail.com> wrote:

> Hi Ted / Konstantin,
>
> Thanks for the feedback. You are correct in that some reducers are doing
> nothing, but many are doing real work, albeit for a very short period of
> time. I'll run this with 1 reducer and post back my results.
>
> Cheers,
> Tim
>
>
> On Tue, Sep 6, 2011 at 3:33 PM, Konstantin Shmakov <kshmakov@gmail.com>wrote:
>
>> K-means have mappers, combiners and reducers and my experience with
>> k-means that mappers and combiners are responsible for most of the
>> performance.  In fact, most k-means jobs will use 1 reducer by
>> default.
>>
>> Did you verify that multiple reducers are actually doing something?
>>
>> Mappers write each point with cluster assignment, combiners read these
>> points and produce intermediate centroids and reducer doing minimum
>> amount of work producing final centroids from intermediate one. One
>> possibility is that by specifying #reducers you partially disable
>> combiners optimization that run on mapper nodes - this could result in
>> more shuffling and date sent between nodes.
>>
>> -- Konstantin
>>
>> On Tue, Sep 6, 2011 at 9:40 AM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>> > It could also have to do with context switch rate, memory set size or
>> memory
>> > bandwidth contention.  Having two many threads of certain kinds can
>> cause
>> > contention on all kinds of resources.
>> >
>> > Without detailed internal diagnostics, these can be very hard to tease
>> out.
>> >  Looking at load average is a good first step.  If you can get to some
>> > memory diagnostics about cache miss rates, you might be able to get
>> further.
>> >
>> > On Tue, Sep 6, 2011 at 10:50 AM, Timothy Potter <thelabdude@gmail.com
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm running a distributed k-means clustering job in a 16 node EC2
>> cluster
>> >> (xlarge instances). I've experimented with 3 reducers per node
>> >> (mapred.reduce.tasks=48) and 2 reducers per node
>> (mapred.reduce.tasks=32).
>> >> In one of my runs (k=120) on 6m vectors with roughly 20k dimensions,
>> I've
>> >> seen a 25% improvement job performance using 2 reducers per node
>> instead of
>> >> 3 (~45 mins to do 10 iterations with 32 reducers vs. ~1 hour with 48
>> >> reducers). The input data and initial clusters are the same in both
>> cases.
>> >>
>> >> My sense was that maybe I was over-utilizing resources with 3 reducers
>> per
>> >> node, but in fact the load average remains healthy (< 4 on xlarge
>> instances
>> >> with 4 virtual cores) and does not swap or anything obvious like that.
>> So
>> >> the only other variable that I can think of here is the number and size
>> of
>> >> output files from one iteration being sent as input to the next
>> iteration
>> >> may have something to do with the performance difference? As further
>> >> evidence to this hunch, running the job with vectors with 11k
>> dimensions,
>> >> the improvement was only about 10% -- so performance gains are better
>> with
>> >> more data.
>> >>
>> >> Does anyone else have any insights to what might be leading to this
>> result?
>> >>
>> >> Cheers,
>> >> Tim
>> >>
>> >
>>
>>
>>
>> --
>> ksh:
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message