From user-return-30526-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Dec 10 00:48:05 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A7022DB76 for ; Mon, 10 Dec 2012 00:48:05 +0000 (UTC) Received: (qmail 92029 invoked by uid 500); 10 Dec 2012 00:48:03 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 91998 invoked by uid 500); 10 Dec 2012 00:48:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 91987 invoked by uid 99); 10 Dec 2012 00:48:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 00:48:03 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a80.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 00:47:56 +0000 Received: from homiemail-a80.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTP id 118C037A080 for ; Sun, 9 Dec 2012 16:47:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=mxEH2Rh/0kqmUFq7NP0lCkXYyY A=; b=TrxT7DRkz747g2bJNvV8KTNGwwvlTiTo4j8m6vpNPIp/gEnssvARJN0EcW 2XDr2CatF5Y9W3uMGoB3QI9//HlAzFTXj779nEEsChI2YjDXRwFhGdut6M/sXFYK mQh+7PKqWZ9p/UH6im1U0tu73u00gnZ4a66W0CTVeThZE15Lo= Received: from [192.168.2.13] (unknown [116.90.132.105]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTPSA id 8D18837A07A for ; Sun, 9 Dec 2012 16:47:35 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_3AFA77E8-1393-4F28-AC1B-63B832691AF9" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Batch mutation streaming Date: Mon, 10 Dec 2012 13:47:38 +1300 References: <2D6EBD5C-E762-44F6-B12A-F170D4FCB658@gmail.com> To: user@cassandra.apache.org In-Reply-To: <2D6EBD5C-E762-44F6-B12A-F170D4FCB658@gmail.com> X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_3AFA77E8-1393-4F28-AC1B-63B832691AF9 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > (and if the message is being decoded on the server site as a complete = message, then presumably the same resident memory consumption applies = there too). Yerp.=20 And every row mutation in your batch becomes a task in the Mutation = thread pool. If one replica gets 500 row mutations from one client = request it will take a while for the (default) 32 threads to chew = through them. While this is going on other client request will be = effectively blocked.=20 Depending on the number of clients, I would start with say 50 rows per = mutation and keep and eye of the *request* latency.=20 Hope that helps.=20 ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 9/12/2012, at 7:18 AM, Ben Hood <0x6e6562@gmail.com> wrote: > Thanks for the clarification Andrey. If that is the case, I had better = ensure that I don't put the entire contents of a very long input stream = into a single batch, since that is presumably going to cause a very = large message to accumulate on the client side (and if the message is = being decoded on the server site as a complete message, then presumably = the same resident memory consumption applies there too). >=20 > Cheers, >=20 >=20 > Ben >=20 > On Dec 7, 2012, at 17:24, Andrey Ilinykh wrote: >=20 >> Cassandra uses thrift messages to pass data to and from server. A = batch is just a convenient way to create such message. Nothing happens = until you send this message. Probably, this is what you call "close the = batch". >>=20 >> Thank you, >> Andrey >>=20 >>=20 >> On Fri, Dec 7, 2012 at 5:34 AM, Ben Hood <0x6e6562@gmail.com> wrote: >> Hi, >>=20 >> I'd like my app to stream a large number of events into Cassandra = that originate from the same network input stream. If I create one batch = mutation, can I just keep appending events to the Cassandra batch until = I'm done, or are there some practical considerations about doing this = (e.g. too much stuff buffering up on the client or server side, = visibility of the data within the batch that hasn't been closed by the = client yet)? Barring any discussion about atomicity, if I were able to = stream a largish source into Cassandra, what would happen if the client = crashed and didn't close the batch? Or is this kind of thing just a = normal occurrence that Cassandra has to be aware of anyway? >>=20 >> Cheers, >>=20 >> Ben >>=20 --Apple-Mail=_3AFA77E8-1393-4F28-AC1B-63B832691AF9 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
(and if the message is = being decoded on the server site as a complete message, then presumably = the same resident memory consumption applies there = too).
Yerp. 
And every row mutation in your batch becomes a task in the = Mutation thread pool. If one replica gets 500 row mutations from one = client request it will take a while for the (default) 32 threads to chew = through them. While this is going on other client request will be = effectively blocked. 

Depending on the number of clients, I would start with say = 50 rows per mutation and keep and eye of the *request* = latency. 

Hope = that helps. 


http://www.thelastpickle.com

On 9/12/2012, at 7:18 AM, Ben Hood <0x6e6562@gmail.com> wrote:

Thanks for the clarification Andrey. If that is the = case, I had better ensure that I don't put the entire contents of a very = long input stream into a single batch, since that is presumably going to = cause a very large message to accumulate on the client side (and if the = message is being decoded on the server site as a complete message, then = presumably the same resident memory consumption applies there = too).

Cheers,


=
Ben

On Dec 7, 2012, at 17:24, Andrey Ilinykh <ailinykh@gmail.com> = wrote:

Cassandra uses thrift = messages to pass data to and from server. A batch is just = a convenient way to create such message. Nothing happens until = you send this message. Probably, this is what you call "close the = batch".

Thank you,
  Andrey


On Fri, Dec 7, = 2012 at 5:34 AM, Ben Hood <0x6e6562@gmail.com> wrote:
Hi,

I'd like my app to stream a large number of events into Cassandra that = originate from the same network input stream. If I create one batch = mutation, can I just keep appending events to the Cassandra batch until = I'm done, or are there some practical considerations about doing this = (e.g. too much stuff buffering up on the client or server side, = visibility of the data within the batch that hasn't been closed by the = client yet)? Barring any discussion about atomicity, if I were able to = stream a largish source into Cassandra, what would happen if the client = crashed and didn't close the batch? Or is this kind of thing just a = normal occurrence that Cassandra has to be aware of anyway?

Cheers,

Ben


= --Apple-Mail=_3AFA77E8-1393-4F28-AC1B-63B832691AF9--