cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrik Modesto <patrik.mode...@gmail.com>
Subject Re: timeout while doing repair
Date Thu, 24 Nov 2011 19:14:49 GMT
We have our own servers, it is 16 core CPU, 32GB ram,8 1TB disks.

I didn't check tpstats, just iotop where cassandra used all the io capacity
when compacting/repairing.

I had to completely clean the test cluster, but I'll check tpstats in the
production. What should I look for?

Regards,
Patrik
Dne 24.11.2011 19:13 "Jahangir Mohammed" <md.jahangir27@gmail.com>
napsal(a):

> What I know is timeout is because of increased load on node due to repair.
>
> Hardware? EC2?
>
> Did you check tpstats?
>
> On Thu, Nov 24, 2011 at 11:42 AM, Patrik Modesto <patrik.modesto@gmail.com
> > wrote:
>
>> Thanks for the reply. I know I can configure longer timeout but in our
>> use case, reply longer than 1second is unacceptable.
>>
>> What I don't understand is why I get timeout while reading differrent
>> keyspace than the repair is working on. I get timeouts even doing
>> compaction.
>>
>> Besides usual access we do lots of reads and writes using Hadoop
>> mapreduce jobs so we need to do compact/repair quite often.
>>
>> Regards
>> Patrik
>> Dne 24.11.2011 15:00 "Jahangir Mohammed" <md.jahangir27@gmail.com>
>> napsal(a):
>>
>>  Do you use any client which gives you this timeout ?
>>>
>>> If you don't specify any timeout from client, look at rpc_timeout_in_ms.
>>> Increase it and see if you still suffer this.
>>>
>>> Repair is a costly process.
>>>
>>> Thanks,
>>> Jahangir Mohammed.
>>>
>>>
>>>
>>> On Thu, Nov 24, 2011 at 2:45 AM, Patrik Modesto <
>>> patrik.modesto@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a test cluster of 4 nodes running Debian and Cassandra 0.8.7,
>>>> there are 3 keyspaces, all with RF=3, a node has load around 40GB.
>>>> When I run "nodetool repair" after a while all thrift clients that
>>>> read with CL.QUORUM get TimeoutException and even some that use just
>>>> CL.ONE. I've tried to run repair on just one keyspace and read from
>>>> other keyspace, but I still get the TimeoutException.
>>>>
>>>> I tried to tune compaction_throughput_mb_per_sec and
>>>> concurrent_compactors but without success. The same problem is
>>>> happening on our production cluster of 8 nodes (same setup).
>>>>
>>>> Where may be the problem?
>>>>
>>>> Regards,
>>>> Patrik
>>>>
>>>
>>>
>

Mime
View raw message