incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Black...@b3k.us>
Subject Re: Benchmarking Cassandra 0.6.5 with YCSB client ... drags to a halt
Date Sat, 28 Aug 2010 23:54:33 GMT
That means you only have a 1G heap.  It's no surprise it dies (most
likely OOM; CMS runs are not inherently bad).  Don't see immediately
why you are seeing the remote latency go up that high, but it is
unlikely yo be a Cassandra problem.

On Sat, Aug 28, 2010 at 4:01 PM, Fernando Racca <fracca@gmail.com> wrote:
> cassandra.in.sh is default, just changed the jmx port
> Storage.conf
> <Storage>
>
>   <ClusterName>Benchmark Cluster</ClusterName>
>   <AutoBootstrap>true</AutoBootstrap>
>   <HintedHandoffEnabled>true</HintedHandoffEnabled>
>   <Keyspaces>
>
>  <Keyspace Name="usertable">
>       <ColumnFamily Name="data" CompareWith="UTF8Type"/>
>
>  <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
>       <ReplicationFactor>2</ReplicationFactor>
>
>  <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
>    </Keyspace>
>   </Keyspaces>
>
> <Authenticator>org.apache.cassandra.auth.AllowAllAuthenticator</Authenticator>
>   <Partitioner>org.apache.cassandra.dht.OrderPreservingPartitioner</Partitioner>
>
>   <InitialToken></InitialToken>
>
>   <CommitLogDirectory>/Developer/Applications/cassandra/commitlog</CommitLogDirectory>
>   <DataFileDirectories>
>
>  <DataFileDirectory>/Developer/Applications/cassandra/data</DataFileDirectory>
>   </DataFileDirectories>
>
>    <Seeds>
>       <Seed>192.168.1.2</Seed> <!-- primary node -->
>       <Seed>192.168.1.4</Seed> <!-- secondary node -->
>   </Seeds>
>   <RpcTimeoutInMillis>10000</RpcTimeoutInMillis>
>
>   <CommitLogRotationThresholdInMB>128</CommitLogRotationThresholdInMB>
>   <ListenAddress>192.168.1.2</ListenAddress>
>   <StoragePort>7000</StoragePort>
>   <ThriftAddress>192.168.1.2</ThriftAddress>
>   <ThriftPort>9160</ThriftPort>
>     <ThriftFramedTransport>false</ThriftFramedTransport>
>   <DiskAccessMode>auto</DiskAccessMode>
>   <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
>   <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>
>   <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
>   <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
>    <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
>
>   <MemtableThroughputInMB>64</MemtableThroughputInMB>
>   <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
>
>   <MemtableOperationsInMillions>0.3</MemtableOperationsInMillions>
>
>   <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
>
>   <ConcurrentReads>8</ConcurrentReads>
>   <ConcurrentWrites>32</ConcurrentWrites>
>   <CommitLogSync>periodic</CommitLogSync>
>
>   <CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>
>
>   <GCGraceSeconds>864000</GCGraceSeconds>
> </Storage>
>
> when I run it both server and client locally, no clustering, im not
> experiencing any delays.  it  averaging 5000 ops a second, maxes out cpu and
> network card outputs 11mb/sec
> Unfortunately when trying to generate load remotely, the client is uber
> slow. it seems to not be able to send more than 500kb /sec, even while i
> should be able to do at least 1.5mb /sec such as when copying over scp. my
> laptops are connected wireless through a router, so network speed is not
> meant to be great, but this is too slow.
> The client is a pure thrift based
> code http://github.com/brianfrankcooper/YCSB/blob/master/db/cassandra-0.6/src/com/yahoo/ycsb/db/CassandraClient6.java
>
> Localhost the latency is <3 ms
> Starting test.
>  Starting test.
>  0 sec: 0 operations;
>  10 sec: 54012 operations; 5383.43 current ops/sec; [INSERT
> AverageLatency(ms)=1.7]
>  20 sec: 102657 operations; 4863.53 current ops/sec; [INSERT
> AverageLatency(ms)=1.99]
>  30 sec: 151330 operations; 4867.3 current ops/sec; [INSERT
> AverageLatency(ms)=1.97]
>  40 sec: 199265 operations; 4790.15 current ops/sec; [INSERT
> AverageLatency(ms)=2]
>  50 sec: 246070 operations; 4676.76 current ops/sec; [INSERT
> AverageLatency(ms)=2.07]
>  60 sec: 298864 operations; 5278.87 current ops/sec; [INSERT
> AverageLatency(ms)=1.81]
>  70 sec: 340002 operations; 4113.8 current ops/sec; [INSERT
> AverageLatency(ms)=2.37]
>  80 sec: 386824 operations; 4682.2 current ops/sec; [INSERT
> AverageLatency(ms)=2.05]
>  90 sec: 431027 operations; 4420.3 current ops/sec; [INSERT
> AverageLatency(ms)=2.18]
>  100 sec: 483440 operations; 5241.82 current ops/sec; [INSERT
> AverageLatency(ms)=1.81]
>  110 sec: 523785 operations; 4034.5 current ops/sec; [INSERT
> AverageLatency(ms)=2.39]
>  120 sec: 576850 operations; 5306.5 current ops/sec; [INSERT
> AverageLatency(ms)=1.79]
>  130 sec: 622157 operations; 4530.25 current ops/sec; [INSERT
> AverageLatency(ms)=2.13]
>  140 sec: 669102 operations; 4694.5 current ops/sec; [INSERT
> AverageLatency(ms)=2.05]
>  150 sec: 714394 operations; 4529.2 current ops/sec; [INSERT
> AverageLatency(ms)=2.13]
>  160 sec: 760176 operations; 4578.2 current ops/sec; [INSERT
> AverageLatency(ms)=2.09]
>  170 sec: 809245 operations; 4906.9 current ops/sec; [INSERT
> AverageLatency(ms)=1.96]
>  180 sec: 855002 operations; 4574.33 current ops/sec; [INSERT
> AverageLatency(ms)=2.11]
>  190 sec: 904312 operations; 4930.51 current ops/sec; [INSERT
> AverageLatency(ms)=1.93]
>  200 sec: 949707 operations; 4539.5 current ops/sec; [INSERT
> AverageLatency(ms)=2.12]
>  210 sec: 998662 operations; 4895.99 current ops/sec; [INSERT
> AverageLatency(ms)=1.71]
>  210 sec: 1000000 operations; 3387.34 current ops/sec; [INSERT
> AverageLatency(ms)=0.38]
> remotely is ~30ms
> Loading workload...
> Starting test.
>  0 sec: 0 operations;
>  10 sec: 3369 operations; 336.4 current ops/sec; [INSERT
> AverageLatency(ms)=29.13]
>  20 sec: 6775 operations; 340.57 current ops/sec; [INSERT
> AverageLatency(ms)=29.29]
>  30 sec: 10194 operations; 341.9 current ops/sec; [INSERT
> AverageLatency(ms)=29.2]
>  40 sec: 13659 operations; 346.5 current ops/sec; [INSERT
> AverageLatency(ms)=28.81]
>  50 sec: 17108 operations; 344.87 current ops/sec; [INSERT
> AverageLatency(ms)=28.94]
>  60 sec: 20584 operations; 347.6 current ops/sec; [INSERT
> AverageLatency(ms)=28.72]
>  70 sec: 24017 operations; 343.27 current ops/sec; [INSERT
> AverageLatency(ms)=29.04]
>  80 sec: 27458 operations; 344.1 current ops/sec; [INSERT
> AverageLatency(ms)=29]
>  90 sec: 30939 operations; 348.1 current ops/sec; [INSERT
> AverageLatency(ms)=28.7]
>  100 sec: 34399 operations; 346 current ops/sec; [INSERT
> AverageLatency(ms)=28.83]
>  110 sec: 37888 operations; 348.9 current ops/sec; [INSERT
> AverageLatency(ms)=28.61]
>  120 sec: 41381 operations; 349.27 current ops/sec; [INSERT
> AverageLatency(ms)=28.59]
> when running the same job both server and client on my second box, it
> outputs multiple GC concurrent mark and sweep and eventually the node dies
> NFO 23:56:15,739 GC for ConcurrentMarkSweep: 1288 ms, 5201448 reclaimed
> leaving 1077816616 used; max is 1207828480
>  INFO 23:56:15,739 Pool Name                    Active   Pending
>  INFO 23:56:15,742 STREAM-STAGE                      0         0
>  INFO 23:56:15,743 FILEUTILS-DELETE-POOL             0         0
>  INFO 23:56:15,744 RESPONSE-STAGE                    0         0
>  INFO 23:56:15,744 ROW-READ-STAGE                    0         0
>  INFO 23:56:15,745 LB-OPERATIONS                     0         0
>  INFO 23:56:15,745 MISCELLANEOUS-POOL                0         0
>  INFO 23:56:15,746 GMFD                              0         2
>  INFO 23:56:15,747 CONSISTENCY-MANAGER               0         0
>  INFO 23:56:15,747 LB-TARGET                         0         0
>  INFO 23:56:15,748 ROW-MUTATION-STAGE                0         6
>  INFO 23:56:15,749 MESSAGE-STREAMING-POOL            0         0
>  INFO 23:56:15,749 LOAD-BALANCER-STAGE               0         0
>  INFO 23:56:15,750 FLUSH-SORTER-POOL                 0         0
>  INFO 23:56:15,750 MEMTABLE-POST-FLUSHER             1         1
>  INFO 23:56:15,751 AE-SERVICE-STAGE                  0         0
>  INFO 23:56:15,751 FLUSH-WRITER-POOL                 1         1
>  INFO 23:56:15,752 HINTED-HANDOFF-POOL               0         0
>  INFO 23:56:15,752 CompactionManager               n/a         1
>  INFO 23:56:17,491 GC for ConcurrentMarkSweep: 1648 ms, 5986176 reclaimed
> leaving 1077634256 used; max is 1207828480
>  INFO 23:56:17,492 Pool Name                    Active   Pending
>  INFO 23:56:17,501 STREAM-STAGE                      0         0
>  INFO 23:56:17,501 FILEUTILS-DELETE-POOL             0         0
>  INFO 23:56:17,502 RESPONSE-STAGE                    0         1
>  INFO 23:56:17,502 ROW-READ-STAGE                    0         0
>  INFO 23:56:17,503 LB-OPERATIONS                     0         0
>  INFO 23:56:17,503 MISCELLANEOUS-POOL                0         0
>  INFO 23:56:17,504 GMFD                              0         0
>  INFO 23:56:17,504 CONSISTENCY-MANAGER               0         0
>  INFO 23:56:17,504 LB-TARGET                         0         0
>  INFO 23:56:17,505 ROW-MUTATION-STAGE                0         2
>  INFO 23:56:17,505 MESSAGE-STREAMING-POOL            0         0
>  INFO 23:56:17,508 LOAD-BALANCER-STAGE               0         0
>  INFO 23:56:17,514 FLUSH-SORTER-POOL                 0         0
>  INFO 23:56:17,515 MEMTABLE-POST-FLUSHER             1         1
>  INFO 23:56:17,519 AE-SERVICE-STAGE                  0         0
>  INFO 23:56:17,527 FLUSH-WRITER-POOL                 1         1
>  INFO 23:56:17,528 HINTED-HANDOFF-POOL               0         0
>  INFO 23:56:18,913 CompactionManager               n/a         1
>  INFO 23:56:20,591 GC for ConcurrentMarkSweep: 1675 ms, 6052824 reclaimed
> leaving 1077609920 used; max is 1207828480
>  INFO 23:56:20,592 Pool Name                    Active   Pending
>  INFO 23:56:20,611 STREAM-STAGE                      0         0
>  INFO 23:56:20,612 FILEUTILS-DELETE-POOL             0         0
>  INFO 23:56:20,613 RESPONSE-STAGE                    2       158
>  INFO 23:56:20,613 ROW-READ-STAGE                    0         0
>  INFO 23:56:20,614 LB-OPERATIONS                     0         0
>  INFO 23:56:20,614 MISCELLANEOUS-POOL                0         0
>  INFO 23:56:20,615 GMFD                              0         0
>  INFO 23:56:20,616 CONSISTENCY-MANAGER               0         0
>  INFO 23:56:20,616 LB-TARGET                         0         0
>  INFO 23:56:20,617 ROW-MUTATION-STAGE                0         1
>  INFO 23:56:20,617 MESSAGE-STREAMING-POOL            0         0
>  INFO 23:56:20,618 LOAD-BALANCER-STAGE               0         0
>  INFO 23:56:20,625 FLUSH-SORTER-POOL                 0         0
>
> the problem seems to be with the second node...
> any ideas?
> On 28 August 2010 22:49, Benjamin Black <b@b3k.us> wrote:
>>
>> cassandra.in.sh?
>> storage-conf.xml?
>> output of iostat -x while this is going on?
>> turn GC log level to debug?
>>
>> On Sat, Aug 28, 2010 at 2:02 PM, Fernando Racca <fracca@gmail.com> wrote:
>> > Hi,
>> > I'm currently executing some benchmarks against 0.6.5, which i plan to
>> > compare against 0.7-beta1, using the YCSB client
>> > I'm experiencing some strange behaviour when running a small 2 nodes
>> > cluster
>> > using OrderPreservingPartitioner. Does anybody have any experience on
>> > using
>> > the client to generate load?
>> > It's the first benchmark that i try so i'm probably doing something
>> > dumb.
>> > A detailed post with screenshots of the VM and CPU history can be seen
>> > in
>> > this
>> >
>> > post.http://quantleap.blogspot.com/2010/08/cassandra-065-benchmarking-first-run.html
>> > I would very much appreciate your help since i'm doing this benchmarks
>> > as
>> > part of my master's dissertation
>> > A previous official benchmark is documented
>> > here http://research.yahoo.com/files/ycsb-v4.pdf
>> > Thanks!
>> > Fernando Racca
>
>

Mime
View raw message