cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ja Sam <ptrstp...@gmail.com>
Subject Re: Permanent ReadTimeout
Date Tue, 13 Jan 2015 09:22:06 GMT
Ad 4) For sure I got a big problem. Because pending tasks: 3094

The question is what should I change/monitor? I can present my whole
solution design, if it helps

On Mon, Jan 12, 2015 at 8:32 PM, Ja Sam <ptrstpppp@gmail.com> wrote:

> To precise your remarks:
>
> 1) About 30 sec GC. I know that after time my cluster had such problem, we
> added "magic" flag, but result will be in ~2 weeks (as I presented in
> screen on StackOverflow). If you have any idea how can fix/diagnose this
> problem, I will be very grateful.
>
> 2) It is probably true, but I don't think that I can change it. Our data
> centers are in different places and the network between them is not
> perfect. But as we observed network partition happened rare. Maximum is
> once a week for an hour.
>
> 3) We are trying to do a regular repairs (incremental), but usually they
> do not finish. Even local repairs have problems with finishing.
>
> 4) I will check it as soon as possible and post it here. If you have any
> suggestion what else should I check, you are welcome :)
>
>
>
>
> On Mon, Jan 12, 2015 at 7:28 PM, Eric Stevens <mightye@gmail.com> wrote:
>
>> If you're getting 30 second GC's, this all by itself could and probably
>> does explain the problem.
>>
>> If you're writing exclusively to A, and there are frequent partitions
>> between A and B, then A is potentially working a lot harder than B, because
>> it needs to keep track of hinted handoffs to replay to B whenever
>> connectivity is restored.  It's also acting as coordinator for writes which
>> need to end up in B eventually.  This in turn may be a significant
>> contributing factor to your GC pressure in A.
>>
>> I'd also grow suspicious of the integrity of B as a reliable backup of A
>> unless you're running repair on a regular basis.
>>
>> Also, if you have thousands of SSTables, then you're probably falling
>> behind on compaction, check nodetool compactionstats - you should typically
>> have < 5 outstanding tasks (preferably 0-1).  If you're not behind on
>> compaction, your sstable_size_in_mb might be a bad value for your use case.
>>
>> On Mon, Jan 12, 2015 at 7:35 AM, Ja Sam <ptrstpppp@gmail.com> wrote:
>>
>>> *Environment*
>>>
>>>
>>>    - Cassandra 2.1.0
>>>    - 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B)
>>>    - 2500 writes per seconds, I write only to DC_A with local_quorum
>>>    - minimal reads (usually none, sometimes few)
>>>
>>> *Problem*
>>>
>>> After a few weeks of running I cannot read any data from my cluster,
>>> because I have ReadTimeoutException like following:
>>>
>>> ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 -
Error occurred during processing of message.
>>> com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only
2 responses.
>>>
>>> To be precise it is not only problem in my cluster, The second one was
>>> described here: Cassandra GC takes 30 seconds and hangs node
>>> <http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node>
and
>>> I will try to use fix from CASSANDRA-6541
>>> <http://issues.apache.org/jira/browse/CASSANDRA-6541> as leshkin
>>> suggested
>>>
>>> *Diagnose *
>>>
>>> I tried to use some tools which were presented on
>>> http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
>>> by Jon Haddad and have some strange result.
>>>
>>>
>>> I tried to run same query in DC_A and DC_B with tracing enabled. Query
>>> is simple:
>>>
>>>    SELECT * FROM X.customer_events WHERE customer='1234567' AND
>>> utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10);
>>>
>>> Where table is defiied as following:
>>>
>>>   CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day
>>> int, bucket int, event_time bigint, event_id blob, event_type int, event
>>> blob,
>>>
>>>   PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id,
>>> event_type)[...]
>>>
>>> Results of the query:
>>>
>>> 1) In DC_B the query finished in less then a 0.22 of second . In DC_A
>>> more then 2.5 (~10 times longer). -> the problem is that bucket can be in
>>> range form -128 to 256
>>>
>>> 2) In DC_B it checked ~1000 SSTables with lines like:
>>>
>>>    Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] |
>>> 2015-01-12 13:51:49.467001 | 192.168.71.198 |           4782
>>>
>>> Where in DC_A it is:
>>>
>>>    Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] |
>>> 2015-01-12 14:01:39.520001 | 192.168.61.199 |          25527
>>>
>>> 3) Total records in both DC were same.
>>>
>>>
>>> *Question*
>>>
>>> The question is quite simple: how can I speed up DC_A - it is my primary
>>> DC, DC_B is mostly for backup, and there is a lot of network partitions
>>> between A and B.
>>>
>>> Maybe I should check something more, but I just don't have an idea what
>>> it should be.
>>>
>>>
>>>
>>
>

Mime
View raw message