From user-return-30580-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Dec 12 12:21:56 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5EE1BD720 for ; Wed, 12 Dec 2012 12:21:56 +0000 (UTC) Received: (qmail 23648 invoked by uid 500); 12 Dec 2012 12:21:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23077 invoked by uid 500); 12 Dec 2012 12:21:47 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23031 invoked by uid 99); 12 Dec 2012 12:21:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Dec 2012 12:21:45 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of 0x6e6562@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bk0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Dec 2012 12:21:37 +0000 Received: by mail-bk0-f44.google.com with SMTP id w11so302526bku.31 for ; Wed, 12 Dec 2012 04:21:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:to:content-transfer-encoding:mime-version; bh=DFBh7HDcgF0QElCavX+IyDLi3J5MXy/L0mqycN2RU1g=; b=PFU0adcWXdbm6ktEEWrxtWW1WXAsV7ecnJae/mW5B+OTc5FxmsUvIwbXXvzFK0HQso h1xta8gNA9ePWMkd+N9U2Fd6TH0o704CBBzw7dz/QXRNuoax39wuvm937Vx/g9X4DmZ2 yxS1vfw5NjSE0RcoOw9KjaRrzyQukeswrwdtKTqtaaE32wasIzQaxVwDKqc5i74Da8n8 QVLQNRHzi4wOFqjuwN41VkaAB5Sjf702DNyAY014HgPzGh2kJTruvDAJyx5ZshTnyDJD mCRoyURiYbiUg0MTJ9gSSRFv6Ov45Kh8YjELawr2NiCnOxJ7ySjzBuXpXC+ctDEgpZ+G ZsQg== Received: by 10.204.129.66 with SMTP id n2mr457217bks.94.1355314877383; Wed, 12 Dec 2012 04:21:17 -0800 (PST) Received: from [192.168.1.110] ([87.82.205.254]) by mx.google.com with ESMTPS id u3sm19953227bkw.9.2012.12.12.04.21.15 (version=SSLv3 cipher=OTHER); Wed, 12 Dec 2012 04:21:16 -0800 (PST) Subject: Re: Batch mutation streaming References: <2D6EBD5C-E762-44F6-B12A-F170D4FCB658@gmail.com> From: Ben Hood <0x6e6562@gmail.com> Content-Type: multipart/alternative; boundary=Apple-Mail-10CA1258-20EE-49FB-B191-35650596D84D X-Mailer: iPhone Mail (10A523) In-Reply-To: Message-Id: <139296D0-4F71-4238-AD5E-A8AED842A497@gmail.com> Date: Wed, 12 Dec 2012 12:21:13 +0000 To: "user@cassandra.apache.org" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-10CA1258-20EE-49FB-B191-35650596D84D Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Hey Aaron, That sounds sensible - thanks for the heads up. Cheers, Ben On Dec 10, 2012, at 0:47, aaron morton wrote: >> (and if the message is being decoded on the server site as a complete mes= sage, then presumably the same resident memory consumption applies there too= ). > Yerp.=20 > And every row mutation in your batch becomes a task in the Mutation thread= pool. If one replica gets 500 row mutations from one client request it will= take a while for the (default) 32 threads to chew through them. While this i= s going on other client request will be effectively blocked.=20 >=20 > Depending on the number of clients, I would start with say 50 rows per mut= ation and keep and eye of the *request* latency.=20 >=20 > Hope that helps.=20 >=20 >=20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand >=20 > @aaronmorton > http://www.thelastpickle.com >=20 > On 9/12/2012, at 7:18 AM, Ben Hood <0x6e6562@gmail.com> wrote: >=20 >> Thanks for the clarification Andrey. If that is the case, I had better en= sure that I don't put the entire contents of a very long input stream into a= single batch, since that is presumably going to cause a very large message t= o accumulate on the client side (and if the message is being decoded on the s= erver site as a complete message, then presumably the same resident memory c= onsumption applies there too). >>=20 >> Cheers, >>=20 >>=20 >> Ben >>=20 >> On Dec 7, 2012, at 17:24, Andrey Ilinykh wrote: >>=20 >>> Cassandra uses thrift messages to pass data to and from server. A batch i= s just a convenient way to create such message. Nothing happens until you se= nd this message. Probably, this is what you call "close the batch". >>>=20 >>> Thank you, >>> Andrey >>>=20 >>>=20 >>> On Fri, Dec 7, 2012 at 5:34 AM, Ben Hood <0x6e6562@gmail.com> wrote: >>>> Hi, >>>>=20 >>>> I'd like my app to stream a large number of events into Cassandra that o= riginate from the same network input stream. If I create one batch mutation,= can I just keep appending events to the Cassandra batch until I'm done, or a= re there some practical considerations about doing this (e.g. too much stuff= buffering up on the client or server side, visibility of the data within th= e batch that hasn't been closed by the client yet)? Barring any discussion a= bout atomicity, if I were able to stream a largish source into Cassandra, wh= at would happen if the client crashed and didn't close the batch? Or is this= kind of thing just a normal occurrence that Cassandra has to be aware of an= yway? >>>>=20 >>>> Cheers, >>>>=20 >>>> Ben >=20 --Apple-Mail-10CA1258-20EE-49FB-B191-35650596D84D Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Hey Aaron,

Th= at sounds sensible - thanks for the heads up.

Cheer= s,

Ben

On Dec 10, 2012, at 0:47, aaron morto= n <aaron@thelastpickle.com= > wrote:

(and if the message is being decoded on the server si= te as a complete message, then presumably the same resident memory consumpti= on applies there too).
Yerp. <= /div>
And every row mutation in your batch becomes a t= ask in the Mutation thread pool. If one replica gets 500 row mutations from o= ne client request it will take a while for the (default) 32 threads to chew t= hrough them. While this is going on other client request will be effectively= blocked. 

Depending= on the number of clients, I would start with say 50 rows per mutation and k= eep and eye of the *request* latency. 

Hope that helps. 

<= div dir=3D"auto">
-------= ----------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
<= /span>

On 9/12/2012, at 7:18 AM, Ben Hood <0x6e6562@gmail.com> wrote:

Thanks for the clarif= ication Andrey. If that is the case, I had better ensure that I don't put th= e entire contents of a very long input stream into a single batch, since tha= t is presumably going to cause a very large message to accumulate on the cli= ent side (and if the message is being decoded on the server site as a comple= te message, then presumably the same resident memory consumption applies the= re too).

Cheers,


Ben

On Dec 7, 2012, at 17:24, Andrey Ilinykh <ailinykh@gmail.com> wrote:

Cassandra uses thrift messages to pass data to and from= server. A batch is just a convenient way to create such message. N= othing happens until you send this message. Probably, this is what you call "= close the batch".

Thank you,
  Andrey


On Fri, Dec 7, 2012 at 5:34 AM, Ben= Hood <0x6e6562@gmail.com> wrote:
Hi,

I'd like my app to stream a large number of events into Cassandra that origi= nate from the same network input stream. If I create one batch mutation, can= I just keep appending events to the Cassandra batch until I'm done, or are t= here some practical considerations about doing this (e.g. too much stuff buf= fering up on the client or server side, visibility of the data within the ba= tch that hasn't been closed by the client yet)? Barring any discussion about= atomicity, if I were able to stream a largish source into Cassandra, what w= ould happen if the client crashed and didn't close the batch? Or is this kin= d of thing just a normal occurrence that Cassandra has to be aware of anyway= ?

Cheers,

Ben


= --Apple-Mail-10CA1258-20EE-49FB-B191-35650596D84D--