Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 15475 invoked from network); 30 Nov 2010 14:19:26 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Nov 2010 14:19:26 -0000 Received: (qmail 82984 invoked by uid 500); 30 Nov 2010 14:19:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82813 invoked by uid 500); 30 Nov 2010 14:19:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82805 invoked by uid 99); 30 Nov 2010 14:19:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 14:19:22 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.139.52.68] (HELO nm5-vm0.bullet.mail.ac4.yahoo.com) (98.139.52.68) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 30 Nov 2010 14:19:13 +0000 Received: from [98.139.52.196] by nm5.bullet.mail.ac4.yahoo.com with NNFMP; 30 Nov 2010 14:18:51 -0000 Received: from [98.139.52.179] by tm9.bullet.mail.ac4.yahoo.com with NNFMP; 30 Nov 2010 14:18:51 -0000 Received: from [127.0.0.1] by omp1062.mail.ac4.yahoo.com with NNFMP; 30 Nov 2010 14:18:51 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 837730.94848.bm@omp1062.mail.ac4.yahoo.com Received: (qmail 34478 invoked by uid 60001); 30 Nov 2010 14:18:51 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1291126731; bh=yG8YTBniWHk0Duh/V3/m21wfJLVvUgWqVOKwxODu8ks=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=SQUbmAPgHgteRLomDxD4ypBBik78+cEYbs6SIAZznOLOQAEmwjbSY4RHd5Z42UyJcPVOBZrLyc21Nw0mIqUWa0k93w4kqLdNU+TYaM/EASfgVbpOQJfp3szyLBf71SP43EVb1I4MIEbPapzKXpRvDxx23EsxpH6RDoZl4m8Ykik= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=wWwl19uvDOqcf655bPk+07qLrNvGIO+Oq2XBAJ/ARuakIdNPafpxzlN7+38ltZS4RNFXgUqpsDpIoskMXq60CzcLZdTsa4jn1ktwLFi+1Qn428NyLLeJG3C3ecLXHULod/bwsQnzYIvfEl1NecBGwcO8wCobb86FhGS8tsW0nOA=; Message-ID: <660743.32612.qm@web51103.mail.re2.yahoo.com> X-YMail-OSG: 8nU2Q_cVM1kwWF53m2F2k4BZ6DiT4nKrc1oDvCoS_1q6FqY 4HKQpso38x0XVateBJLFQuY7G7xILgpGLmdnGBKDnagZVbnqqXlHAhLLSDpv G8Ede.3W9TpvRRFtZi2lPdPk3_P0MqcrXpKxxpcfsx9KQ02aDwqF.XgeN9lZ b1ekhfyktbq6ZSFBrkNq47aWREETpduTj.zYcvmJikc6gEukDrS3ygoMVm9Y 0CNz4ZjZRGvCd7GUhBO4bPOw5SsN7PAgb8JCvXZ3kRux4FchBEasFwOtgEe5 Qgf4HVdGQ6cDQYVV_5TI_AyKRzKZgQkyKKicPo1yFtgKaBztbCfjb0DUxsvE - Received: from [67.86.43.84] by web51103.mail.re2.yahoo.com via HTTP; Tue, 30 Nov 2010 06:18:51 PST X-Mailer: YahooMailRC/553 YahooMailWebService/0.8.107.285259 References: <330852.66053.qm@web51103.mail.re2.yahoo.com> Date: Tue, 30 Nov 2010 06:18:51 -0800 (PST) From: E S Subject: Re: Achieving isolation on single row modifications with batch_mutate To: user@cassandra.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-362242932-1291126731=:32612" X-Virus-Checked: Checked by ClamAV on apache.org --0-362242932-1291126731=:32612 Content-Type: text/plain; charset=us-ascii I'm chunking up a larger blob. Basically the size of each row can vary (averages around 500K - 1MB), with some outliers in the 50 MB range. However, when I do an update, I can usually just read/update a portion of that blob. A lot of my read operations can also work on a smaller chunk. The number of columns is going to depend on the size of the blob itself. I'm also considering using supercolumns to have higher save granularity. My biggest problem is that I will have to update these rows a lot (several times a day) and often very quickly (process 15 thousand in 2-3 minutes). While I think I could probably scale up with a lot of hardware to meet that load, it seems like I'm doing much much more work than I need to (processing 15 GB of data in 2-3 minutes as opposes to 100 MB). I also worry about handling our future data size needs. I can split the blob up without a lot of extra complexity but am worried about how to have readers read a non-corrupted version of the object, since sometimes I'll have to update multiple chunks as one unit. ________________________________ From: Tyler Hobbs To: user@cassandra.apache.org Sent: Tue, November 30, 2010 12:57:07 AM Subject: Re: Achieving isolation on single row modifications with batch_mutate In this case, it sounds like you should combine columns A and B if you are writing them both at the same time, reading them both at the same time, and need them to be consistent. Obviously, you're probably dealing with more than two columns here, but there's generally not any value in splitting something into multiple columns if you're always writing and reading all of them at the same time. Or are you talking about chunking huge blobs across a row? - Tyler On Sat, Nov 27, 2010 at 10:12 AM, E S wrote: I'm trying to figure out the best way to achieve single row modification >isolation for readers. > >As an example, I have 2 rows (1,2) with 2 columns (a,b). If I modify both rows, >I don't care if the user sees the write operations completed on 1 and not on 2 >for a short time period (seconds). I also don't care if when reading row 1 the >user gets the new value, and then on a re-read gets the old value (within a few >seconds). Because of this, I have been planning on using a consistency level of >one. > >However, if I modify both columns A,B on a single row, I need both changes on >the row to be visible/invisible atomically. It doesn't matter if they both >become visible and then both invisible as the data propagates across nodes, but >a half-completed state on an initial read will basically be returning corrupt >data given my apps consistency requirements. My understanding from the FAQ that >this single row multicolumn change provides no read isolation, so I will have >this problem. Is this correct? If so: > >Question 1: Is there a way to get this type of isolation without using a >distributed locking mechanism like cages? > >Question 2: Are there any plans to implement this type of isolation within >Cassandra? > >Question 3: If I went with a distributed locking mechanism, what consistency >level would I need to use with Cassandra? Could I still get away with a >consistency level of one? It seems that if the initial write is done in a >non-isolated way, but if cross-node row synchronizations are done all or >nothing, I could still use one. > >Question 4: Does anyone know of a good c# alternative to cages/zookeeper? > >Thanks for any help with this! > > > > > --0-362242932-1291126731=:32612 Content-Type: text/html; charset=us-ascii
I'm chunking up a larger blob.  Basically the size of each row can vary (averages around 500K - 1MB), with some outliers in the 50 MB range.   However, when I do an update, I can usually just read/update a portion of that blob.  A lot of my read operations can also work on a smaller chunk.  The number of columns is going to depend on the size of the blob itself.  I'm also considering using supercolumns to have higher save granularity.

My biggest problem is that I will have to update these rows a lot (several times a day) and often very quickly (process 15 thousand in 2-3 minutes).  While I think I could probably scale up with a lot of hardware to meet that load, it seems like I'm doing much much more work than I need to (processing 15 GB of data in 2-3 minutes as opposes to 100 MB).  I also worry about handling our future data size needs.

I can split the blob up without a lot of extra complexity but am worried about how to have readers read a non-corrupted version of the object, since sometimes I'll have to update multiple chunks as one unit.


From: Tyler Hobbs <tyler@riptano.com>
To: user@cassandra.apache.org
Sent: Tue, November 30, 2010 12:57:07 AM
Subject: Re: Achieving isolation on single row modifications with batch_mutate

In this case, it sounds like you should combine columns A and B if you
are writing them both at the same time, reading them both at the same
time, and need them to be consistent.

Obviously, you're probably dealing with more than two columns here, but
there's generally not any value in splitting something into multiple columns
if you're always writing and reading all of them at the same time.

Or are you talking about chunking huge blobs across a row?

- Tyler

On Sat, Nov 27, 2010 at 10:12 AM, E S <tr1sklion@yahoo.com> wrote:
I'm trying to figure out the best way to achieve single row modification
isolation for readers.

As an example, I have 2 rows (1,2) with 2 columns (a,b).  If I modify both rows,
I don't care if the user sees the write operations completed on 1 and not on 2
for a short time period (seconds).  I also don't care if when reading row 1 the
user gets the new value, and then on a re-read gets the old value (within a few
seconds).  Because of this, I have been planning on using a consistency level of
one.

However, if I modify both columns A,B on a single row, I need both changes on
the row to be visible/invisible atomically.  It doesn't matter if they both
become visible and then both invisible as the data propagates across nodes, but
a half-completed state on an initial read will basically be returning corrupt
data given my apps consistency requirements.  My understanding from the FAQ that
this single row multicolumn change provides no read isolation, so I will have
this problem.  Is this correct?  If so:

Question 1:  Is there a way to get this type of isolation without using a
distributed locking mechanism like cages?

Question 2:  Are there any plans to implement this type of isolation within
Cassandra?

Question 3:  If I went with a distributed locking mechanism, what consistency
level would I need to use with Cassandra?  Could I still get away with a
consistency level of one?  It seems that if the initial write is done in a
non-isolated way, but if cross-node row synchronizations are done all or
nothing, I could still use one.

Question 4:  Does anyone know of a good c# alternative to cages/zookeeper?

Thanks for any help with this!






--0-362242932-1291126731=:32612--