cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Reynolds <creyno...@ancestry.com>
Subject Re: Random slow read times in Cassandra
Date Fri, 17 Mar 2017 21:21:56 GMT
It look looks you maybe correct.

From visual Vm I see the cpu is spiking any time I have GC activity.


Here are my JVM settings.  Does anything stick out?
VM_OPTS="$JVM_OPTS -XX:+CMSClassUnloadingEnabled"
JVM_OPTS="$JVM_OPTS -XX:+UseThreadPriorities"
JVM_OPTS="$JVM_OPTS -XX:ThreadPriorityPolicy=42"
JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError"
JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$$.hprof"
JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=120m"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=10000"
JVM_OPTS="$JVM_OPTS -XX:+UseCondCardMark"
JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT -XX:+DisableExplicitGC"



[cid:image001.png@01D29F32.3631A4D0]




From: Jonathan Haddad <jon@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, March 17, 2017 at 1:24 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Random slow read times in Cassandra

Probably Jvm pauses. Check your logs for long GC times.
On Fri, Mar 17, 2017 at 11:51 AM Chuck Reynolds <creynolds@ancestry.com<mailto:creynolds@ancestry.com>>
wrote:
I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has consistently random high read
times.  In general most reads are under 10 milliseconds but with in the 30 request there is
usually a read time that is a couple of seconds.

Instance type: r4.8xlarge
EBS GP2 volumes, 3tb with 9000 IOPS
30 Gig Heap

Data per node is about 170 gigs

The keyspace is an id & a blob.  When I check the data the slow reads don’t seem to
have anything to do with size of the blobs

This system has repairs run once a weeks because it takes a lot of updates.

The client makes a call and does 30 request serially to Cassandra and the response times look
like this in milliseconds.

What could make these so slow and what can I do to diagnosis this?


Responses

Get Person time: 3 319746229:9009:66
Get Person time: 7 1830093695:9009:66
Get Person time: 4 30072253:9009:66
Get Person time: 4 2303790089:9009:66
Get Person time: 2 156792066:9009:66
Get Person time: 8 491230624:9009:66
Get Person time: 7 284904599:9009:66
Get Person time: 4 600370489:9009:66
Get Person time: 2 281007386:9009:66
Get Person time: 4 971178094:9009:66
Get Person time: 1 1322259885:9009:66
Get Person time: 2 1937958542:9009:66
Get Person time: 9 286536648:9009:66
Get Person time: 9 1835633470:9009:66
Get Person time: 2 300867513:9009:66
Get Person time: 3 178975468:9009:66
Get Person time: 2900 293043081:9009:66
Get Person time: 8 214913830:9009:66
Get Person time: 2 1956710764:9009:66
Get Person time: 4 237673776:9009:66
Get Person time: 17 68942206:9009:66
Get Person time: 1800 20072145:9009:66
Get Person time: 2 304698506:9009:66
Get Person time: 2 308177320:9009:66
Get Person time: 2 998436038:9009:66
Get Person time: 10 1036890112:9009:66
Get Person time: 1 1629649548:9009:66
Get Person time: 6 1595339706:9009:66
Get Person time: 4 1079637599:9009:66
Get Person time: 3 556342855:9009:66


Get Person time: 5 1856382256:9009:66
Get Person time: 3 1891737174:9009:66
Get Person time: 2 1179373651:9009:66
Get Person time: 2 1482602756:9009:66
Get Person time: 3 1236458510:9009:66
Get Person time: 11 1003159823:9009:66
Get Person time: 2 1264952556:9009:66
Get Person time: 2 1662234295:9009:66
Get Person time: 1 246108569:9009:66
Get Person time: 5 1709881651:9009:66
Get Person time: 3213 11878078:9009:66
Get Person time: 2 112866483:9009:66
Get Person time: 2 201870153:9009:66
Get Person time: 6 227696684:9009:66
Get Person time: 2 1946780190:9009:66
Get Person time: 2 2197987101:9009:66
Get Person time: 18 1838959725:9009:66
Get Person time: 3 1782937802:9009:66
Get Person time: 3 1692530939:9009:66
Get Person time: 9 1765654196:9009:66
Get Person time: 2 1597757121:9009:66
Get Person time: 2 1853127153:9009:66
Get Person time: 3 1533599253:9009:66
Get Person time: 6 1693244112:9009:66
Get Person time: 6 82047537:9009:66
Get Person time: 2 96221961:9009:66
Get Person time: 4 98202209:9009:66
Get Person time: 9 12952388:9009:66
Get Person time: 2 300118652:9009:66
Get Person time: 10 78801084:9009:66


Get Person time: 13 1856424913:9009:66
Get Person time: 2 255814186:9009:66
Get Person time: 2 1183397424:9009:66
Get Person time: 5 1828603730:9009:66
Get Person time: 9 132965919:9009:66
Get Person time: 4 1616190071:9009:66
Get Person time: 2 15929337:9009:66
Get Person time: 10 297005427:9009:66
Get Person time: 2 1306460047:9009:66
Get Person time: 5 620139216:9009:66
Get Person time: 2 1364349058:9009:66
Get Person time: 3 629543403:9009:66
Get Person time: 5 1299827034:9009:66
Get Person time: 4 1593205912:9009:66
Get Person time: 2 1755460077:9009:66
Get Person time: 2 1906388666:9009:66
Get Person time: 1 1838653952:9009:66
Get Person time: 2 2249662508:9009:66
Get Person time: 3 1931708432:9009:66
Get Person time: 2 2177004948:9009:66
Get Person time: 2 2042756682:9009:66
Get Person time: 5 41764865:9009:66
Get Person time: 4023 1733384704:9009:66
Get Person time: 1 1614842189:9009:66
Get Person time: 2 2194211396:9009:66
Get Person time: 3 1711330834:9009:66
Get Person time: 2 2264849689:9009:66
Get Person time: 3 1819027970:9009:66
Get Person time: 2 1978614851:9009:66
Get Person time: 1 1863483129:9009:66

Mime
View raw message