incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: performance problems on new cluster
Date Tue, 16 Aug 2011 02:12:26 GMT
Just checking do you have read_repair_chance set to something ? The second request is going
to all replicas which should only happen with CL ONE if read repair is running for the request.


The exceptions are happening during read repair which is running async to the main read request.
It's occurring after we have detected a digest mis match, when the process is trying to reconcile
the full data responses from the replicas. The Assertion error is happening because the replica
sent a digest response. The NPE is probably happening because the response did not include
a row, how / why the response is not marked as digest is a mystery. 

This may be related to the main problem. If not dont forget to some back to it.

In you first log with the timeout something is not right…
> DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,187 StorageProxy.java (line 546) reading
data from /dc1host3
> DEBUG [pool-2-thread-14] 2011-08-15 05:26:35,191 StorageProxy.java (line 593) Read timeout:
java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from
/dc1host3,  .
The reading… log messages are written before the inter node messages are sent. For this
CL ONE read only node dc 1 host 3 is involved and it has been asked for the data response.
Makes sense if Read Repair is not running for the request. 

*But* the timeout error says we got a response from dc 1 host 3. One way I can see that happening
is dc 1 host 3 returning a digest instead of a data response (see o.a.c.service.ReadCallback.response(Message)).
Which kind of matches what we saw above. 

We need some more extensive logging and probably a trip to https://issues.apache.org/jira/browse/CASSANDRA

Would be good to see full DEBUG logs from both dc1 host 1 and dc1 host 3 if you can that reproduce
the fault like the first one. Turn off read repair to make things a little simpler. If thats
too much we need StorageProxy, ReadCalback, ReadVerbHandler

Can you update the email thread with the ticket. 

Thanks
A

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15/08/2011, at 7:34 PM, Anton Winter wrote:

>> OK, node latency is fine and you are using some pretty low
>> consistency. You said NTS with RF 2, is that RF 2 for each DC ?
> 
> Correct, I'm using RF 2 for each DC.
> 
> 
> 
> I was able to reproduce the cli timeouts on the non replica nodes.
> 
> The debug log output from dc1host1 (non replica node):
> 
> DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,183 StorageProxy.java (line 518) Command/ConsistencyLevel
is SliceFromReadCommand(table='ks1', key='userid1', column_parent='QueryPath(columnFamilyName='cf1',
superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', columnName='null')', start='',
finish='', reversed=false, count=1000000)/ONE
> DEBUG [pool-2-thread-14] 2011-08-15 05:26:15,187 StorageProxy.java (line 546) reading
data from /dc1host3
> DEBUG [pool-2-thread-14] 2011-08-15 05:26:35,191 StorageProxy.java (line 593) Read timeout:
java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from
/dc1host3,  .
> 
> 
> If the query is run again on the same node (dc1host1) 0 rows are returned and the following
DEBUG messages are logged:
> 
> 
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line 518) Command/ConsistencyLevel
is SliceFromReadCommand(table='ks1', key='userid1', column_parent='QueryPath(columnFamilyName='cf1',
superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', columnName='null')', start='',
finish='', reversed=false, count=1000000)/ONE
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line 546) reading
data from /dc1host3
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,513 StorageProxy.java (line 562) reading
digest from /dc1host2
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading
digest from /dc2host3
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading
digest from /dc2host2
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading
digest from /dc3host2
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading
digest from /dc3host3
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading
digest from /dc4host3
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:05,514 StorageProxy.java (line 562) reading
digest from /dc4host2
> DEBUG [pool-2-thread-14] 2011-08-15 05:32:06,022 StorageProxy.java (line 588) Read: 508
ms.
> ERROR [ReadRepairStage:2112] 2011-08-15 05:32:06,404 AbstractCassandraDaemon.java (line
133) Fatal exception in thread Thread[ReadRepairStage:2112,5,main]
> java.lang.AssertionError
>        at org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:73)
>        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> 
> 
> Subsequent queries afterwards show "reading data from /dc1host2" however the results
returned remains 0.
> 
> 
> If I run the same query on a replica I get the correct result returned but with 2 exceptions
as follows:
> 
> 
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,792 StorageProxy.java (line 518) Command/ConsistencyLevel
is SliceFromReadCommand(table='ks1', key='userid1', column_parent='QueryPath(columnFamilyName='cf1',
superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', columnName='null')', start='',
finish='', reversed=false, count=1000000)/ONE
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line 541) reading
data locally
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line 562) reading
digest from /dc1host3
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line 562) reading
digest from dns.entry.for.dc3host2/dc3host2
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,793 StorageProxy.java (line 562) reading
digest from dns.entry.for.dc3host3/dc3host3
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line 562) reading
digest from dns.entry.for.dc2host2/dc2host2
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line 562) reading
digest from dns.entry.for.dc2host3/dc2host3
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line 562) reading
digest from dc4host2/dc4host2
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,794 StorageProxy.java (line 562) reading
digest from dc4host3/dc4host3
> DEBUG [ReadStage:20102] 2011-08-15 05:45:49,793 StorageProxy.java (line 690) LocalReadRunnable
reading SliceFromReadCommand(table='ks1', key='userid1', column_parent='QueryPath(columnFamilyName='cf1',
superColumnName='java.nio.HeapByteBuffer[pos=64 lim=67 cap=109]', columnName='null')', start='',
finish='', reversed=false, count=1000000)
> DEBUG [pool-2-thread-5] 2011-08-15 05:45:49,795 StorageProxy.java (line 588) Read: 1
ms.
> ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,890 AbstractCassandraDaemon.java (line
133) Fatal exception in thread Thread[ReadRepairStage:11,5,main]
> java.lang.RuntimeException: java.lang.NullPointerException
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.NullPointerException
>        at org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:74)
>        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        ... 3 more
> ERROR [ReadRepairStage:11] 2011-08-15 05:45:50,892 AbstractCassandraDaemon.java (line
133) Fatal exception in thread Thread[ReadRepairStage:11,5,main]
> java.lang.RuntimeException: java.lang.NullPointerException
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.NullPointerException
>        at org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.java:74)
>        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:54)
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        ... 3 more
> 
> The above have been reproduced in each DC.
> 
> 
> The testing that has been performing poorly happens to be on the non replica nodes. 
Initial testing against one of the replicas appears that they do not exhibit the performance
problems that the non replicas do.
> 
> Regards,
> Anton
> 
> 
> 
> On Fri, 12 Aug 2011 14:11:06 +1200, aaron morton wrote:
>>> 
>>> iostat doesn't show a request queue bottleneck.  The timeouts we are seeing is
for reads.  The latency on the nodes I have temporarily used for reads is around 2-45ms. 
The next token in the ring at an alternate DC is showing ~4ms with everything else around
0.05ms.  tpstats desn't show any active/pending.  Reads are at CL.ONE & Writes using CL.ANY
>> 
>> OK, node latency is fine and you are using some pretty low
>> consistency. You said NTS with RF 2, is that RF 2 for each DC ?
>> 
>> The steps below may help get an idea of whats going on…
>> 
>> 1) use nodetool getendpoints to determine which replicas a key is.
>> 2) connect directly to one of the endpoints with the CLI, ensure CL
>> is ONE and do your test query.
>> 3) connect to another node in the same DC that is not a replica and
>> do the same.
>> 4) connect to another node in a different DC and do the same
>> 
>> Once you can repo it try turning up the logging not the coordinator
>> to DEBUG you can do this via JConsole. Look for these lines….
>> 
>> * Command/ConsistencyLevel is….
>> * reading data locally... or reading data from…
>> * reading digest locally… or reading digest for from…
>> * Read timeout:…
>> 
>> You'll also see some lines about receiving messages from other nodes.
>> 
>> Hopefully you can get an idea of which nodes are involved in a
>> failing query. Getting a thrift TimedOutException on a read with CL
>> ONE is pretty odd.
>> 
>>> What can I do in regards to confirming this issue is still outstanding and/or
we are affected by it?
>> It's in 0.8 and will not be fixed. My unscientific approach was to
>> repair a single CF at a time, hoping that the differences would be
>> smaller and less data would be streamed.
>> Minor compaction should help squish things down. If you want to get
>> more aggressive reduce the min compaction threshold and trigger a
>> minor compaction with nodetool flush.
>> 
>>> Version of failure detection?  I've not seen anything on this so I suspect this
is the default.
>> Was asking so I could see if their were any fixed in Gossip or the
>> FailureDetect that you were missing. Check the CHANGES.txt file.
>> 
>> Hope that helps.
> 


Mime
View raw message