Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 4050 invoked from network); 20 Oct 2010 22:35:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Oct 2010 22:35:40 -0000 Received: (qmail 32579 invoked by uid 500); 20 Oct 2010 22:35:39 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32477 invoked by uid 500); 20 Oct 2010 22:35:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32469 invoked by uid 99); 20 Oct 2010 22:35:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 22:35:38 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mmoores@real.com designates 207.188.23.7 as permitted sender) Received: from [207.188.23.7] (HELO cir-el.real.com) (207.188.23.7) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 22:35:32 +0000 Received: from seacas02.corp.real.com ([::ffff:192.168.139.57]) (TLS: TLSv1/SSLv3,128bits,AES128-SHA) by cir-el.real.com with esmtp; Wed, 20 Oct 2010 15:35:06 -0700 id 001FC120.4CBF6E9A.00001ED5 Received: from seambx.corp.real.com ([fe80::2d15:fda7:b3b8:e268]) by seacas02.corp.real.com ([::1]) with mapi; Wed, 20 Oct 2010 15:35:06 -0700 From: Michael Moores To: "user@cassandra.apache.org" Date: Wed, 20 Oct 2010 15:35:04 -0700 Subject: Re: Throttling ColumnFamilyRecordReader Thread-Topic: Throttling ColumnFamilyRecordReader Thread-Index: ActwpwrkojV/eSSsSae9ZnZ6oQQ5lQ== Message-ID: <2902C786-A51B-4D86-A57A-51A98D38675C@real.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org sorry i had a misunderstanding of the MapRed report output. i did reduce mapreduce.tasktracker.map.tasks.maximum (number of concurrent = maps per node) from the default of 2 to 1. i suppose if i want to do this on a per job/user basis i'll try out the had= oop fair scheduler. On Oct 19, 2010, at 1:27 PM, Jonathan Ellis wrote: > (Moving to user@.) >=20 > Isn't reducing the number of map tasks the easiest way to tune this? >=20 > Also: in 0.7 you can use NetworkTopologyStrategy to designate a group > of nodes as your hadoop "datacenter" so the workloads won't overlap. >=20 > On Tue, Oct 19, 2010 at 3:22 PM, Michael Moores wrote: >> Does it make sense to add some kind of throttle capability on the Column= FamilyRecordReader for Hadoop? >>=20 >> If I have 60 or so Map tasks running at the same time when the cluster i= s already heavily loaded with OLTP operations, I can get some decreased on-= line performance >> that may not be acceptable. (I'm loading an 8 node cluster with 2000 TP= S.) By default my cluster of 8 nodes (which are also the Hadoop JobTracker= nodes) has 8 Map tasks per node making the get_range_slices call, based on= what the ColumnFamilyInputFormat has calculated from my token ranges. >> I can increase the inputSplitSize (ConfigHelper.setInputSplitSIze()) so= that there >> is only one Map task per node, and this helps quite a bit. >>=20 >> But is it reasonable to provide a configurable sleep to cause a wait in = between smaller size range queries? That would stretch out the Map time >> and let the OLTP processing be less affected. >>=20 >>=20 >> --Michael >>=20 >>=20 >>=20 >=20 >=20 >=20 > --=20 > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com