cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Throttling ColumnFamilyRecordReader
Date Tue, 19 Oct 2010 20:27:55 GMT
(Moving to user@.)

Isn't reducing the number of map tasks the easiest way to tune this?

Also: in 0.7 you can use NetworkTopologyStrategy to designate a group
of nodes as your hadoop "datacenter" so the workloads won't overlap.

On Tue, Oct 19, 2010 at 3:22 PM, Michael Moores <> wrote:
> Does it make sense to add some kind of throttle capability on the ColumnFamilyRecordReader
for Hadoop?
> If I have 60 or so Map tasks running at the same time when the cluster is already heavily
loaded with OLTP operations, I can get some decreased on-line performance
> that may not be acceptable.  (I'm loading an 8 node cluster with 2000 TPS.)  By default
my cluster of 8 nodes (which are also the Hadoop JobTracker nodes) has 8 Map tasks per node
making the get_range_slices call, based on what the ColumnFamilyInputFormat has calculated
from my token ranges.
> I can increase the inputSplitSize  (ConfigHelper.setInputSplitSIze()) so that there
> is only one Map task per node, and this helps quite a bit.
> But is it reasonable to provide a configurable sleep to cause a wait in between smaller
size range queries?  That would stretch out the Map time
> and let the OLTP processing be less affected.
> --Michael

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

View raw message