incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stu.h...@rackspace.com>
Subject Re: Cassandra benchmarking on Rackspace Cloud
Date Mon, 19 Jul 2010 17:34:39 GMT
This is absolutely your bottleneck, as Brandon mentioned before. Your client machine is maxing
out at 37K requests per second.

-----Original Message-----
From: "David Schoonover" <david.schoonover@gmail.com>
Sent: Monday, July 19, 2010 12:30pm
To: user@cassandra.apache.org
Subject: Re: Cassandra benchmarking on Rackspace Cloud

> How many physical client machines are running stress.py?

One with 50 threads; it is remote from the cluster but within the same
DC in both cases. I also run the test with multiple clients and saw
similar results when summing the reqs/sec.


On Mon, Jul 19, 2010 at 1:22 PM, Stu Hood <stu.hood@rackspace.com> wrote:
> How many physical client machines are running stress.py?
>
> -----Original Message-----
> From: "David Schoonover" <david.schoonover@gmail.com>
> Sent: Monday, July 19, 2010 12:11pm
> To: user@cassandra.apache.org
> Subject: Re: Cassandra benchmarking on Rackspace Cloud
>
> Hello all, I'm Oren's partner in crime on all this. I've got a few more numbers to add.
>
> In an effort to eliminate everything but the scaling issue, I set up a cluster on dedicated
hardware (non-virtualized; 8-core, 16G RAM).
>
> No data was loaded into Cassandra -- 100% of requests were misses. This is, so far as
we can reason about the problem, as fast as the database can perform; disk is out of the picture,
and the hardware is certainly more than sufficient.
>
> nodes   reads/sec
> 1       53,000
> 2       37,000
> 4       37,000
>
> I ran this test previously on the cloud, with similar results:
>
> nodes   reads/sec
> 1       24,000
> 2       21,000
> 3       21,000
> 4       21,000
> 5       21,000
> 6       21,000
>
> In fact, I ran it twice out of disbelief (on different nodes the second time) to essentially
identical results.
>
> Other Notes:
>  - stress.py was run in both random and gaussian mode; there was no difference.
>  - Runs were 10+ minutes (where the above number represents an average excluding the
beginning and the end of the run).
>  - Supplied node lists covered all boxes in the cluster.
>  - Data and commitlog directories were deleted between each run.
>  - Tokens were evenly spaced across the ring, and changed to match cluster size before
each run.
>
> If anyone has explanations or suggestions, they would be quite welcome. This is surprising
to say the least.
>
> Cheers,
>
> Dave
>
>
>
> On Jul 19, 2010, at 11:42 AM, Stu Hood wrote:
>
>> Hey Oren,
>>
>> The Cloud Servers REST API returns a "hostId" for each server that indicates which
physical host you are on: I'm not sure if you can see it from the control panel, but a quick
curl session should get you the answer.
>>
>> Thanks,
>> Stu
>>
>> -----Original Message-----
>> From: "Oren Benjamin" <oren@clearspring.com>
>> Sent: Monday, July 19, 2010 10:30am
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Cassandra benchmarking on Rackspace Cloud
>>
>> Certainly I'm using multiple cloud servers for the multiple client tests.  Whether
or not they are resident on the same physical machine, I just don't know.
>>
>>   -- Oren
>>
>> On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:
>>
>> On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin <oren@clearspring.com<mailto:oren@clearspring.com>>
wrote:
>> Thanks for the info.  Very helpful in validating what I've been seeing.  As for
the scaling limit...
>>
>>>> The above was single node testing.  I'd expect to be able to add nodes and
scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless
of the number of nodes in the cluster.
>>>
>>> This is what I would expect if a single machine is handling all the
>>> Thrift requests.  Are you spreading the client connections to all the
>>> machines?
>>
>> Yes - in all tests I add all nodes in the cluster to the --nodes list.  The client
requests are in fact being dispersed among all the nodes as evidenced by the intermittent
TimedOutExceptions in the log which show up against the various nodes in the input list.  Could
it be a result of all the virtual nodes being hosted on the same physical hardware?  Am I
running into some connection limit?  I don't see anything pegged in the JMX stats.
>>
>> It's unclear if you're using multiple client machines for stress.py or not, a limitation
of 24k/21k for a single quad-proc machine is normal in my experience.
>>
>> -Brandon
>>
>>
>>
>
>
>
>



-- 
LOVE DAVE



Mime
View raw message