Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 14289 invoked from network); 28 Feb 2011 17:25:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Feb 2011 17:25:56 -0000 Received: (qmail 24123 invoked by uid 500); 28 Feb 2011 17:25:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23878 invoked by uid 500); 28 Feb 2011 17:25:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23870 invoked by uid 99); 28 Feb 2011 17:25:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Feb 2011 17:25:48 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [15.201.24.20] (HELO g4t0017.houston.hp.com) (15.201.24.20) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Feb 2011 17:25:36 +0000 Received: from G6W0640.americas.hpqcorp.net (g6w0640.atlanta.hp.com [16.230.34.76]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by g4t0017.houston.hp.com (Postfix) with ESMTPS id 181E53886E for ; Mon, 28 Feb 2011 17:25:15 +0000 (UTC) Received: from G5W0326.americas.hpqcorp.net (16.228.8.70) by G6W0640.americas.hpqcorp.net (16.230.34.76) with Microsoft SMTP Server (TLS) id 8.2.176.0; Mon, 28 Feb 2011 17:24:22 +0000 Received: from GVW0673EXC.americas.hpqcorp.net ([16.230.33.202]) by G5W0326.americas.hpqcorp.net ([16.228.8.70]) with mapi; Mon, 28 Feb 2011 17:24:22 +0000 From: "Flachbart, Dirk (HP Software - TransactionVision)" To: "user@cassandra.apache.org" Date: Mon, 28 Feb 2011 17:24:19 +0000 Subject: Question about insert performance in multiple node cluster Thread-Topic: Question about insert performance in multiple node cluster Thread-Index: AcvXbFUeBFB/LXnJR4+ETNmfg08SrQ== Message-ID: <4A43B7BD5F98824E8DB2C49495BFB4BB622DFE8F82@GVW0673EXC.americas.hpqcorp.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_4A43B7BD5F98824E8DB2C49495BFB4BB622DFE8F82GVW0673EXCame_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_4A43B7BD5F98824E8DB2C49495BFB4BB622DFE8F82GVW0673EXCame_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, We are trying to use Cassandra for high-performance insertion of simple key= /value records. I have set up Cassandra on two of my machines in my local n= etwork (Windows 2008 server), using pretty much the default configuration. = I created a test driver in java (using thrift) which inserts a single 1K da= ta column (keys are unique strings of integer values) with multiple threads= . On each machine I am able to achieve around 9,000 inserts/sec when runnin= g the test driver with the local Cassandra server. Then I set up a cluster with both machines, and ran the same test again (th= e test driver is still local to one of the Cassandra nodes). Surprisingly I= did not see any improvement in the insert performance, I got the same 9000= inserts/sec as when running with a single node. I know that I shouldn't ex= pect linear scaling to 18,000 operations/sec, but shouldn't I see at least = some significant improvement? The CPU isn't fully loaded on either of the m= achines, and the network utilization is low too (1000 mbit network). Later = on I also tested adding a third node, but that didn't improve anything eith= er. I suspect I'm doing something wrong with setting up the cluster. The only c= hanges I made on the second machine were: - AutoBootstrap=3Dtrue - Setting 'Seed' to the IP of the other node Did I miss anything? Or am I simply wrong in expecting the throughput to sc= ale when using multiple nodes? Thanks, Dirk --_000_4A43B7BD5F98824E8DB2C49495BFB4BB622DFE8F82GVW0673EXCame_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi,

 

We are try= ing to use Cassandra for high-performance insertion of simple key/value rec= ords. I have set up Cassandra on two of my machines in my local network (Wi= ndows 2008 server), using pretty much the default configuration. I created = a test driver in java (using thrift) which inserts a single 1K data column = (keys are unique strings of integer values) with multiple threads. On each = machine I am able to achieve around 9,000 inserts/sec when running the test= driver with the local Cassandra server.

 

Then I set up a cluster with bot= h machines, and ran the same test again (the test driver is still local to = one of the Cassandra nodes). Surprisingly I did not see any improvement in = the insert performance, I got the same 9000 inserts/sec as when running wit= h a single node. I know that I shouldn’t expect linear scaling to 18,= 000 operations/sec, but shouldn’t I see at least some significant imp= rovement? The CPU isn’t fully loaded on either of the machines, and t= he network utilization is low too (1000 mbit network). Later on I also test= ed adding a third node, but that didn’t improve anything either.=

 

I = suspect I’m doing something wrong with setting up the cluster. The on= ly changes I made on the second machine were:

 

-   =        AutoBootstrap= =3Dtrue

-    = ;      Setting ‘Seed= ’ to the IP of the other node

 

 

Did I miss anything? Or am I simply wrong in expecting the throughput = to scale when using multiple nodes?

 

 

 

Thanks,

Dirk

 

 

= --_000_4A43B7BD5F98824E8DB2C49495BFB4BB622DFE8F82GVW0673EXCame_--