cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Mixing incremental repair with sequential
Date Fri, 26 Jun 2015 23:40:20 GMT
Here is something I wrote some time ago:

http://planetcassandra.org/blog/interview/video-advertising-platform-teads-chose-cassandra-spm-and-opscenter-to-monitor-a-personalized-ad-experience/

Monitoring absolutely necessary to understand what is happening in the
system. There is no magic in there and if you find bottlenecks, you can
think about how to alleviate things. I would say at least as much as the
design of your data models.

"I've lowered compaction threshhold from 18 to 10mb/s. Will see what
happens."
If you have no SSD and compactions are creating a bottleneck at the disk
the disk, this looks reasonable as long as the "compactions pending" metric
remains low enough.

If it is a cpu issue and you have many cores, I would advice you to try
lowering the concurrent_compactor: number. (by default 1 compactor per core)

Once again it will depend on were the pressure is. Anyway, you might want
to do anything you will try on one node only to test it first. Also, one
option at the time (or a couple that you believe would have a synergy), and
monitor the evolutions.

C*heers,

Alain

2015-06-26 21:30 GMT+02:00 Carl Hu <me@carlhu.com>:

> Thank you, Alain, for the response. We're using 2.1 indeed. I've lowered
> compaction threshhold from 18 to 10mb/s. Will see what happens.
>
> >  I hope you have a monitoring tool up and running and an easy way to
> detect errors on your logs.
>
> We do not have this. What do you use for this?
>
> Thank you,
> Carl
>
>
> On Fri, Jun 26, 2015 at 11:26 AM, Alain RODRIGUEZ <arodrime@gmail.com>
> wrote:
>
>> "It is not possible to mix sequential repair and incremental repairs."
>>
>> I guess that is a system limitation, even if I am not sure of it (I don't
>> have used C*2.1 yet)
>>
>> I would focus on tuning your repair by :
>> - Monitoring performance / logs (see why the cluster hangs)
>> - Use range repairs (as a workaround to the Merkle tree 32K limit) or at
>> list run it per table (
>> http://www.datastax.com/dev/blog/advanced-repair-techniques)
>>
>> Depending on what's the root issue that makes hang your cluster it is
>> hard to help you.
>>
>> - If CPU is a limit, then some tuning around compactions or GC might be
>> needed (or a few more things)
>> - if you have Disk IO limitations, you might want to add machines or tune
>> compaction throughput
>> - If your network is the issue, there are commands to tune the bandwidth
>> used by streams.
>>
>> You need to troubleshot this and give us more informations. I hope you
>> have a monitoring tool up and running and an easy way to detect errors on
>> your logs.
>>
>> C*heers,
>>
>> Alain
>>
>> 2015-06-26 16:26 GMT+02:00 Carl Hu <me@carlhu.com>:
>>
>>> Dear colleagues,
>>>
>>> We are using incremental repair and have noticed that every few repairs,
>>> the cluster experiences pauses.
>>>
>>> We run the repair with the following command: nodetool repair -par -inc
>>>
>>> I have tried to run it not in parallel, but get the following error:
>>> "It is not possible to mix sequential repair and incremental repairs."
>>>
>>> Does anyone have any suggestions?
>>>
>>> Many thanks in advance,
>>> Carl
>>>
>>>
>>
>

Mime
View raw message