incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jahangir Mohammed <md.jahangi...@gmail.com>
Subject Re: timeout while doing repair
Date Thu, 24 Nov 2011 19:22:53 GMT
That will give you a snapshot of thread pools. You should look at
ROW-READ-STAGE and see pending and active. If there are many pending, it
means that the cluster is not able to keep up with the read requests coming
along.

Thanks,
Jahangir Mohammed.

On Thu, Nov 24, 2011 at 2:14 PM, Patrik Modesto <patrik.modesto@gmail.com>wrote:

> We have our own servers, it is 16 core CPU, 32GB ram,8 1TB disks.
>
> I didn't check tpstats, just iotop where cassandra used all the io
> capacity when compacting/repairing.
>
> I had to completely clean the test cluster, but I'll check tpstats in the
> production. What should I look for?
>
> Regards,
> Patrik
> Dne 24.11.2011 19:13 "Jahangir Mohammed" <md.jahangir27@gmail.com>
> napsal(a):
>
> What I know is timeout is because of increased load on node due to repair.
>>
>> Hardware? EC2?
>>
>> Did you check tpstats?
>>
>> On Thu, Nov 24, 2011 at 11:42 AM, Patrik Modesto <
>> patrik.modesto@gmail.com> wrote:
>>
>>> Thanks for the reply. I know I can configure longer timeout but in our
>>> use case, reply longer than 1second is unacceptable.
>>>
>>> What I don't understand is why I get timeout while reading differrent
>>> keyspace than the repair is working on. I get timeouts even doing
>>> compaction.
>>>
>>> Besides usual access we do lots of reads and writes using Hadoop
>>> mapreduce jobs so we need to do compact/repair quite often.
>>>
>>> Regards
>>> Patrik
>>> Dne 24.11.2011 15:00 "Jahangir Mohammed" <md.jahangir27@gmail.com>
>>> napsal(a):
>>>
>>>  Do you use any client which gives you this timeout ?
>>>>
>>>> If you don't specify any timeout from client, look at
>>>> rpc_timeout_in_ms. Increase it and see if you still suffer this.
>>>>
>>>> Repair is a costly process.
>>>>
>>>> Thanks,
>>>> Jahangir Mohammed.
>>>>
>>>>
>>>>
>>>> On Thu, Nov 24, 2011 at 2:45 AM, Patrik Modesto <
>>>> patrik.modesto@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a test cluster of 4 nodes running Debian and Cassandra 0.8.7,
>>>>> there are 3 keyspaces, all with RF=3, a node has load around 40GB.
>>>>> When I run "nodetool repair" after a while all thrift clients that
>>>>> read with CL.QUORUM get TimeoutException and even some that use just
>>>>> CL.ONE. I've tried to run repair on just one keyspace and read from
>>>>> other keyspace, but I still get the TimeoutException.
>>>>>
>>>>> I tried to tune compaction_throughput_mb_per_sec and
>>>>> concurrent_compactors but without success. The same problem is
>>>>> happening on our production cluster of 8 nodes (same setup).
>>>>>
>>>>> Where may be the problem?
>>>>>
>>>>> Regards,
>>>>> Patrik
>>>>>
>>>>
>>>>
>>

Mime
View raw message