Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 71613 invoked from network); 4 Jul 2010 08:19:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Jul 2010 08:19:39 -0000 Received: (qmail 32507 invoked by uid 500); 4 Jul 2010 08:19:38 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32498 invoked by uid 500); 4 Jul 2010 08:19:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32488 invoked by uid 99); 4 Jul 2010 08:19:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jul 2010 08:19:34 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jul 2010 08:19:26 +0000 Received: by gyh3 with SMTP id 3so2400304gyh.31 for ; Sun, 04 Jul 2010 01:19:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.90.115.9 with SMTP id n9mr1546146agc.48.1278231545558; Sun, 04 Jul 2010 01:19:05 -0700 (PDT) Received: by 10.151.98.8 with HTTP; Sun, 4 Jul 2010 01:19:05 -0700 (PDT) X-Originating-IP: [80.179.102.198] In-Reply-To: References: Date: Sun, 4 Jul 2010 11:19:05 +0300 Message-ID: Subject: Re: Write assurance in Cassandra From: David Boxenhorn To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016361e887e36b83c048a8b79ad X-Virus-Checked: Checked by ClamAV on apache.org --0016361e887e36b83c048a8b79ad Content-Type: text/plain; charset=ISO-8859-1 Yes, it was. I was dumping data from Oracle into Cassandra. On Sun, Jul 4, 2010 at 11:11 AM, Andrew Rollins wrote: > Is your IO under heavy load? If it is, that may be the cause, otherwise I'm > not sure what causes significant lag. On Linux I like to use "iostat -tx 10" > to check IO. > > - Andrew > > > On Sun, Jul 4, 2010 at 4:04 AM, David Boxenhorn wrote: > >> Thank you very much! I now understand things much better. >> >> However, my configuration is as follows: >> >> periodic >> 10000 >> >> So I should see my commit log change after 10,000 milliseconds = 10 >> seconds? It seems to take much longer to show up. >> >> On Sun, Jul 4, 2010 at 10:52 AM, Andrew Rollins wrote: >> >>> By default Cassandra syncs the commit log to disk periodically, so if you >>> are looking at file sizes, you won't see the most up to date numbers. This >>> is just like how if you tail a file that isn't flushing frequently, you >>> might wait a little while before you see the updates. >>> >>> In periodic mode, Cassandra acknowledges the write to the client >>> immediately (even before it is synced). You can run Cassandra in batch mode >>> instead, which basically means it writes in batches *and* it won't >>> acknowledge the writes to the client until it has actually synced. I'm still >>> somewhat new to this, but that's my understanding. >>> >>> Have a look at CommitLogSync in your storage-conf.xml for more info about >>> setting up syncing periods. >>> >>> As an aside, I'm not sure why the "ack immediately" or "ack after sync" >>> setting is piggybacked on the periodic vs batch setting. At first glance it >>> seems like concepts should be independent of one another. >>> >>> - Andrew >>> >>> >>> On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn wrote: >>> >>>> As I understand it, when you write to Cassandra, you are assured that, >>>> if successful, the new data has been written to a log file - so that if >>>> there is a crash your data is safe. Is this correct? >>>> >>>> If the above is correct, there is something going on that I don't >>>> understand. Are the log files to which the data is first written the ones >>>> that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ? >>>> The reason I ask is that when I write a lot of data, nothing seems to change >>>> in the commitlog directory for a long time, then at some point the log files >>>> in this directory get updated. It looks to me like there's memory caching >>>> involved, and the new data is not being immediately written to disk. What is >>>> going on? >>>> >>> >>> >> > --0016361e887e36b83c048a8b79ad Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Yes, it was. I was dumping data from Oracle into Cassandra= .

On Sun, Jul 4, 2010 at 11:11 AM, Andre= w Rollins <an= drew@localytics.com> wrote:
Is your IO under = heavy load? If it is, that may be the cause, otherwise I'm not sure wha= t causes significant lag. On Linux I like to use "iostat -tx 10" = to check IO.

- Andrew
=


On Sun, Jul 4, 2010 at 4:04 AM, Dav= id Boxenhorn <david@lookin2.com> wrote:
Thank you very much! I now understand things much better.<= br>
However, my configuration is as follows:

=A0 <CommitLogSyn= c>periodic</CommitLogSync>
=A0 <CommitLogSyncPeriodInMS>1= 0000</CommitLogSyncPeriodInMS>

So I should see my commit log change after 10,000 milliseconds =3D 10 s= econds? It seems to take much longer to show up.
<= br>
On Sun, Jul 4, 2010 at 10:52 AM, Andrew Rolli= ns <andrew@localytics.com> wrote:
By default C= assandra syncs the commit log to disk periodically, so if you are looking a= t file sizes, you won't see the most up to date numbers. This is just l= ike how if you tail a file that isn't flushing frequently, you might wa= it a little while before you see the updates.

In periodic mode, Cassandra=A0acknowledges=A0the write = to the client immediately (even before it is synced). You can run Cassandra= in batch mode instead, which basically means it writes in batches and it won't acknowledge the writes to the client until it has actually = synced. I'm still somewhat new to this, but that's my understanding= .

Have a look at=A0CommitLogSync in your storage-conf.xml for = more info about setting up syncing periods.

As an aside,= I'm not sure why the "ack immediately" or "ack after sy= nc" setting is piggybacked on the periodic vs batch setting. At first = glance it seems like concepts should be independent of one another.

- Andrew


On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn <david@lookin2.= com> wrote:
As I understand it, when you write to Cassandra, you are a= ssured that, if successful, the new data has been written to a log file - s= o that if there is a crash your data is safe. Is this correct?

If t= he above is correct, there is something going=A0on that I don't underst= and. Are the log files to which the data is first written the ones that loo= k like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ? The reaso= n I ask is that when I write a lot of data, nothing seems to change in the = commitlog directory for a long time, then at some point the log files in th= is directory get updated. It looks to me like there's memory caching in= volved, and the new data is not being immediately written to disk. What is = going on?




--0016361e887e36b83c048a8b79ad--