hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Eng <a...@maprtech.com>
Subject Re: Why they recommend this (CPU) ?
Date Thu, 11 Oct 2012 19:15:04 GMT
Without a doubt, there are many CPU intensive workloads where the amount of
CPU cycles consumed to process some amount of data is many times higher
than what would be considered relatively normal.  But at the same time,
there are many memory intensive workloads and IO bound workloads that are
common as well.  I've worked with companies who have been doing all 3 on a
single cluster, which is another point to be aware of.

Unless you are building a single application, single purpose cluster,
you'll probably have a mix of jobs with a mix of resource profiles.  So
designing a cluster so your CPU heavy job runs faster may mean you skimped
on spindles or disk speed, and when you want to run your new application
and do your mixed workload, you end up having a bottleneck on the IO side.

So keep in mind, not just the profile of a specific workload, but of the
work you want to support on the cluster in general.

On Thu, Oct 11, 2012 at 12:03 PM, Russell Jurney

> My own clusters are too temporary and virtual for me to notice. I haven't
> thought of clock speed as having mattered in a long time, so I'm curious
> what kind of use cases might benefit from faster cores. Is there a category
> in some way where this sweet spot for faster cores occurs?
> Russell Jurney http://datasyndrome.com
> On Oct 11, 2012, at 11:39 AM, Ted Dunning <tdunning@maprtech.com> wrote:
> You should measure your workload.  Your experience will vary dramatically
> with different computations.
> On Thu, Oct 11, 2012 at 10:56 AM, Russell Jurney <russell.jurney@gmail.com
> > wrote:
>> Anyone got data on this? This is interesting, and somewhat
>> counter-intuitive.
>> Russell Jurney http://datasyndrome.com
>> On Oct 11, 2012, at 10:47 AM, Jay Vyas <jayunit100@gmail.com> wrote:
>> > Presumably, if you have a reasonable number of cores - speeding the
>> cores up will be better than forking a task into smaller and smaller chunks
>> - because at some point the overhead of multiple processes would be a
>> bottleneck - maybe due to streaming reads and writes?  I'm sure each and
>> every problem has a different sweet spot.

View raw message