aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhuvan Arumugam <>
Subject Re: aurora replica log snapshot interval
Date Tue, 02 Jun 2015 17:51:06 GMT
On Tue, Jun 2, 2015 at 10:25 AM, Maxim Khutornenko <> wrote:
> Hi Bhuvan,
> We have never had to change the native_log_write timeout from its
> default value but we have definitely seen problems with scheduler
> failovers related to snapshotting. It is indeed an IO intensive
> operation that may and will block all other activities especially when
> overlapped with a backup creation. During the snapshot creation an
> exclusive write lock is held making all other mutation operations
> impossible. Reads may still be served though.

Thank you, Maxim!
Useful info indeed! We'll refrain from changing the snapshot interval.

> I would suggest a more thorough investigation to make sure it was
> truly a native_log_write timeout that caused your failover.

Yes, we confirmed it's due to write timeout, in this case:
     Caused by: java.util.concurrent.TimeoutException: Timed out while
attempting to append

> Identifying the root cause is crucial here as we have seen two major
> causes for failovers: excessive GC activity leading to ZK timeouts and
> slow disk IO blocking writes in underlying native log storage. Below
> are a few leads:
> Excessive GC:
> - consider using snapshot de-duplication [1] if you are not already
> using it. This has helped us significantly reduce GC activity and
> stored snapshot size.

Interesting. Right now, we haven't enabled snapshots de-dup. We'll enable it.

> - consider finely tuning your GC perf. It's not an easy task but there
> are plenty of online resources to help (e.g. [2]).
> Excessive IO:
> - consider changing your underlying system IO scheduler. By just
> switching from cfq to deadline we have virtually eliminated our
> failovers due to excessive IO. See AURORA-1211 for details.

Sure. We are using cfq i/o scheduler in our scheduler hosts. We'll
investigate if changing to deadline improve the situation.

> [1] -
> [2] -
> On Tue, Jun 2, 2015 at 9:33 AM, Bhuvan Arumugam <> wrote:
>> Hello,
>> In a 300 nodes cluster with 5 scheduler in the quorum, the replica log
>> writes fail due to timeout (native_log_write_timeout: 3secs)
>> especially when 50+ tasks are flapping. The next leader takes around
>> 2mins+ to complete the log replay and become active. The service is
>> inaccessible to users, as aurora isn't yet listening on the port.
>> Users face 503 errors. Why? The snapshot wasn't taken during last few
>> hours because the crash happen within configured snapshot interval
>> (default: 1 hour).
>> We bumped the log write timeout and in parallel investigating the
>> reason for timeout, whether it's due to bad hardware, etc. In the
>> meantime, we want to reduce service disruption to the users by
>> bringing down the replay time. I like to know,
>> a) is reducing snapshot interval (dlog_snapshot_interval) to 30 mins
>> the right thing to do
>> b) it snapshot event i/o intensive?
>> c) it takes 0-6 seconds to snapshot 10k events, from last snapshot.
>> does the scheduler block user requests when snapshot is in progress?
>> Thank you,
>> --
>> Regards,
>> Bhuvan Arumugam

Bhuvan Arumugam

View raw message