Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EC262160AF5 for ; Fri, 28 Oct 2016 11:00:19 +0200 (CEST) Received: (qmail 96376 invoked by uid 500); 28 Oct 2016 09:00:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 96367 invoked by uid 99); 28 Oct 2016 09:00:13 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Oct 2016 09:00:13 +0000 Received: from mail-lf0-f46.google.com (mail-lf0-f46.google.com [209.85.215.46]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 0D17B1A0044 for ; Fri, 28 Oct 2016 09:00:13 +0000 (UTC) Received: by mail-lf0-f46.google.com with SMTP id b75so51493097lfg.3 for ; Fri, 28 Oct 2016 02:00:12 -0700 (PDT) X-Gm-Message-State: ABUngvd16RFirvO+Ifn0RMnVxHIO+Yf0cBZNCoZkmpbruLrtsIF+xLVNvo4dWNdtZbDZ5zTXd0SUa8/ipV0zWA== X-Received: by 10.25.150.205 with SMTP id y196mr3589045lfd.58.1477645211342; Fri, 28 Oct 2016 02:00:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.219.25 with HTTP; Fri, 28 Oct 2016 02:00:10 -0700 (PDT) In-Reply-To: References: From: Benedict Elliott Smith Date: Fri, 28 Oct 2016 10:00:10 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: How does the "batch" commit log sync works To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=001a11401b22ed2249053fe915d6 archived-at: Fri, 28 Oct 2016 09:00:20 -0000 --001a11401b22ed2249053fe915d6 Content-Type: text/plain; charset=UTF-8 That is the maximum length of time that queries may be batched together for, not the minimum. If there is a break in the flow of queries for the commit log, it will commit those outstanding immediately. It will anyway commit in clusters of commit log file size (default 32Mb). I know the documentation used to disagree with itself in a few places, and with actual behaviour, but I thought that had been fixed. I suggest you file a ticket if you find a mention that does not match this description. Really the batch period is a near useless parameter. If it were to be honoured as a minimum, performance would decline due to the threading model in Cassandra (and it will be years before this and memory management improve enough to support that behaviour). Conversely honouring it as a maximum is only possible for very small values, just by nature of queueing theory. I believe I proposed removing the parameter entirely some time ago, though it is lost in the mists of time. Anyway, many people do indeed use this commitlog mode successfully, although it is by far less common than periodic mode. This behaviour does not mean your data is in anyway unsafe. On Friday, 28 October 2016, Edward Capriolo wrote: > I mentioned during my Cassandra.yaml presentation at the summit that I > never saw anyone use these settings. Things off by default are typically > not highly not covered well by tests. It sounds like it is not working. > Quick suggestion: go back in time maybe to a version like 1.2.X or 0.7 and > see if it behaves like the yaml suggests it should. > > On Thu, Oct 27, 2016 at 11:48 PM, Hiroyuki Yamada > wrote: > >> Hello Satoshi and the community, >> >> I am also using commitlog_sync for durability, but I have never >> modified commitlog_sync_batch_window_in_ms parameter yet, >> so I wondered if it is working or not. >> >> As Satoshi said, I also changed commitlog_sync_batch_window_in_ms (to >> 10000) and restarted C* and >> issued some INSERT command. >> But, it actually returned immediately right after issuing. >> >> So, it seems like the parameter is not working correctly. >> Are we missing something ? >> >> Thanks, >> Hiro >> >> On Thu, Oct 27, 2016 at 5:58 PM, Satoshi Hikida > > wrote: >> > Hi, all. >> > >> > I have a question about "batch" commit log sync behavior with C* version >> > 2.2.8. >> > >> > Here's what I have done: >> > >> > * set commitlog_sync to the "batch" mode as follows: >> > >> >> commitlog_sync: batch >> >> commitlog_sync_batch_window_in_ms: 10000 >> > >> > * ran a script which inserts the data to a table >> > * prepared a disk dedicated to store the commit logs >> > >> > According to the DataStax document, I expected that fsync is done once >> in a >> > batch window (one fsync per 10sec in this case) and writes issued within >> > this batch window are blocked until fsync is completed. >> > >> > In my experiment, however, it seems that the write requests returned >> almost >> > immediately (within 300~400 ms). >> > >> > Am I misunderstanding something? If so, can someone give me any advices >> as >> > to the reason why C* behaves like this? >> > >> > >> > I referred to this document: >> > https://docs.datastax.com/en/cassandra/2.2/cassandra/configu >> ration/configCassandra_yaml.html#configCassandra_yaml__ >> PerformanceTuningProps >> > >> > Regards, >> > Satoshi >> > >> > > --001a11401b22ed2249053fe915d6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable That is the maximum length of time that=C2=A0queries may=C2=A0be batched to= gether for, not the minimum. If there is a break in the flow of queries for= the commit log, it will commit those outstanding=C2=A0immediately.=C2=A0 I= t will anyway commit in clusters of commit log file size (default 32Mb).
I know the documentation used to disagree with itself in a= few places, and with actual=C2=A0behaviour, but I thought that had been fi= xed.=C2=A0 I suggest you file a ticket if you find a mention that does not = match this description.

Really the batch period is= a near useless parameter.=C2=A0 If it were to be honoured as a minimum, pe= rformance would decline due to=C2=A0the threading model in Cassandra=C2=A0(= and=C2=A0it will be years before this and=C2=A0memory management improve en= ough to support that behaviour).

Conversely honour= ing it as a maximum is only possible for very small values, just by nature = of queueing theory. =C2=A0

I=C2=A0believe I propos= ed removing the parameter entirely some time ago, though it is= lost in the mists of time.

Anyway, many people do= indeed=C2=A0use this commitlog mode successfully,=C2=A0although it is by f= ar less common than periodic mode.=C2=A0 This behaviour does not mean your = data is in=C2=A0anyway=C2=A0unsafe.

On Friday, 28 October 2016, Edwa= rd Capriolo <edlinuxguru@gmail.= com> wrote:
I men= tioned during my Cassandra.yaml presentation at the summit that I never saw= anyone use these settings. Things off by default are typically not highly = not covered well by tests. It sounds like it is not working. Quick suggesti= on: go back in time maybe to a version like 1.2.X or 0.7 and see if it beha= ves like the yaml suggests it should.

<= div class=3D"gmail_quote">On Thu, Oct 27, 2016 at 11:48 PM, Hiroyuki Yamada= <mogwaing@gmail.com> wrote:
Hello Satoshi and the communit= y,

I am also using commitlog_sync for durability, but I have never
modified commitlog_sync_batch_window_in_ms parameter yet,
so I wondered if it is working or not.

As Satoshi said, I also changed commitlog_sync_batch_window_in_ms (to<= br> 10000) and restarted C* and
issued some INSERT command.
But, it actually returned immediately right after issuing.

So, it seems like the parameter is not working correctly.
Are we missing something ?

Thanks,
Hiro

On Thu, Oct 27, 2016 at 5:58 PM, Satoshi Hikida <s= ahikida@gmail.com> wrote:
> Hi, all.
>
> I have a question about "batch" commit log sync behavior wit= h C* version
> 2.2.8.
>
> Here's what I have done:
>
> * set commitlog_sync to the "batch" mode as follows:
>
>> commitlog_sync: batch
>> commitlog_sync_batch_window_in_ms: 10000
>
> * ran a script which inserts the data to a table
> * prepared a disk dedicated to store the commit logs
>
> According to the DataStax document, I expected that fsync is done once= in a
> batch window (one fsync per 10sec in this case) and writes issued with= in
> this batch window are blocked until fsync is completed.
>
> In my experiment, however, it seems that the write requests returned a= lmost
> immediately (within 300~400 ms).
>
> Am I misunderstanding something? If so, can someone give me any advice= s as
> to the reason why C* behaves like this?
>
>
> I referred to this document:
> https://docs.datastax.com/en/cassandra/2.2/cassandra/configuration/configCassandra_yaml.html#= configCassandra_yaml__PerformanceTuningProps
>
> Regards,
> Satoshi
>

--001a11401b22ed2249053fe915d6--