incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andras Szerdahelyi <andras.szerdahe...@ignitionone.com>
Subject Read latency skyrockets after 95-98th percentile
Date Tue, 31 Jul 2012 12:52:49 GMT
hey list,

i've been trying to understand why we are seeing rather nasty read latency peaks in as much
as 2% of our total read ( not sure what the underlying thrift call is, should be get_slice
) requests that we have been **unable to tie to garbage collection or blocking I/O.**
This is what i mean by "nasty" :
Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.0242499999999999, 95th: 2.1103499999999977,
98th: 5.2102600000000034, 99th: 13.24835000000001, 999th: 41.89454000000041

This is measured around the Astyanax / Hector wrappers for running column slice queries and
iterating over their results. FWIW they perform very similarly.

Now, the mean is awesome - 1.2ms for a read is really really fast, i think thrift overhead
makes up for probably 1ms alone. What i can't live with is pretty much everything thats beyond
the 98th percentile. Perhaps my expectations are misaligned to begin with and this is great,
but the dataset under test that yielded this result ( see below ) is extremely small - only
a 1000 items per test made up of a few small dynamic composite columns  and the read is being
done sequentially, from a single threaded client. I just don't understand whats causing the
peaks.

Here's the complete run-down on the setup:
The data is dynamic composite columns with string or integer values. The DynamicComposites
themselves are made up of 3 Components each: <String,String,Integer> and <String,Integer,Integer>

The query is getting all columns ( in this particular test, that will be a single column )
of  one of the composite types ( <String, String, Integer> or <String, Integer, Integer>
) for a particular string key.
( How this is achieved is more or less at https://gist.github.com/3207894 )


The production cluster is set up as follows:
30-40 read/sec clients from multiple hosts, NetworkTopologyStrategy, 2 replicas per DC, read
consistency: ONE, 2DCs, 2 nodes per DC )
8GB multi-core physical hosts
most settings on default in cassandra.yaml


In an attempt to isolate the problem, i've done the following ( this is where the results
above are coming from but we see roughly the same numbers in prod )

- created a short unit test around the ( two ) classes that wrap the Astyanax and Hector libraries
in our application, in which i'm only testing the read performance
- in setUp() i start Cassandra with

CassandraStoreKeeperTest.cassandra = new CassandraDaemon();
CassandraStoreKeeperTest.cassandra.init(null);
CassandraStoreKeeperTest.cassandra.start();

- same cassandra yaml as in prod, file at https://gist.github.com/3207894 ( mostly defaults
)
- i pass the following flags to the JVM running my unit test:  -server -Xmx2G and all the
GC flags from cassandra-env.sh, plus the jamm java agent
- i warm up the embedded cassandra with 1k reads and 1k writes
- the test is 1000 single threaded, sequential reads via Hector and Astyanax ( no significant
differences between the two ) via thrift to the embedded single-node Cassandra instance
- i read back the 1000 items saved in the warm-up

The results are
Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.0242499999999999, 95th: 2.1103499999999977,
98th: 5.2102600000000034, 99th: 13.24835000000001, 999th: 41.89454000000041

here's the GC output during the read

[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1:     281480 bytes,     281480 total
: 17322K->308K(19136K), 0.0037375 secs] 28742K->11885K(83008K), 0.0038170 secs] [Times:
user=0.01 sys=0.00, real=0.01 secs]
[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1:     261360 bytes,     261360 total
: 17332K->386K(19136K), 0.0034069 secs] 28909K->12098K(83008K), 0.0034849 secs] [Times:
user=0.00 sys=0.01, real=0.00 secs]
[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1:     241336 bytes,     241336 total
: 17410K->383K(19136K), 0.0035252 secs] 29122K->12209K(83008K), 0.0036062 secs] [Times:
user=0.00 sys=0.00, real=0.00 secs]

Doesn't look like GC ( i could be wrong ) - what else could it be ?

all suggestions are highly appreciated..
Thanks!
Andras


ps: this is with Cassandra 1.1.0





Mime
View raw message