cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Meyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6977) attempting to create 10K column families fails with 100 node cluster
Date Fri, 04 Apr 2014 19:22:16 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960312#comment-13960312
] 

Daniel Meyer commented on CASSANDRA-6977:
-----------------------------------------

I am not sure if memory is the issue here.  I monitored memory with visualvm and found the
maximum used heap to be only 1GB.  There were no OOM errors in the logs.  Further, if memory
were the issue I would think that the 5 node cluster would run into this; however, in the
case of the 5 node cluster this issue does not occur and we are able to create the 10K cfs
without a problem (albeit it takes a while).

> attempting to create 10K column families fails with 100 node cluster
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-6977
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6977
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: 100 nodes, Ubuntu 12.04.3 LTS, AWS m1.large instances
>            Reporter: Daniel Meyer
>         Attachments: 100_nodes_all_data.png, all_data_5_nodes.png, keyspace_create.py,
logs.tar, tpstats.txt, visualvm_tracer_data.csv
>
>
> During this test we are attempting to create a total of 1K keyspaces with 10 column families
each to bring the total column families to 10K.  With a 5 node cluster this operation can
be completed; however, it fails with 100 nodes.  Please see the two charts.  For the 5 node
case the time required to create each keyspace and subsequent 10 column families increases
linearly until the number of keyspaces is 1K.  For a 100 node cluster there is a sudden increase
in latency between 450 keyspaces and 550 keyspaces.  The test ends when the test script times
out.  After the test script times out it is impossible to reconnect to the cluster with the
datastax python driver because it cannot connect to the host:
> cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'10.199.5.98':
OperationTimedOut()}
> It was found that running the following stress command does work from the same machine
the test script runs on.
> cassandra-stress -d 10.199.5.98 -l 2 -e QUORUM -L3 -b -o INSERT
> It should be noted that this test was initially done with DSE 4.0 and c* version 2.0.5.24
and in that case it was not possible to run stress against the cluster even locally on a node
due to not finding the host.
> Attached are system logs from one of the nodes, charts showing schema creation latency
for 5 and 100 node clusters and virtualvm tracer data for cpu, memory, num_threads and gc
runs, tpstat output and the test script.
> The test script was on an m1.large aws instance outside of the cluster under test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message