Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20A2B103EB for ; Mon, 6 May 2013 17:28:23 +0000 (UTC) Received: (qmail 96426 invoked by uid 500); 6 May 2013 17:28:20 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 96396 invoked by uid 500); 6 May 2013 17:28:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 96387 invoked by uid 99); 6 May 2013 17:28:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 17:28:20 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.93.158.29] (HELO thsbbfxrt02p.thalesgroup.com) (192.93.158.29) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 17:28:16 +0000 Received: from thsbbfxrt02p.thalesgroup.com (localhost [127.0.0.1]) by localhost (Postfix) with SMTP id 3E89517F8 for ; Mon, 6 May 2013 19:27:54 +0200 (CEST) X-Thales-IRT1: IRT11 From: DE VITO Dominique To: "user@cassandra.apache.org" Date: Mon, 6 May 2013 19:27:51 +0200 Subject: RE: cost estimate about some Cassandra patchs Thread-Topic: cost estimate about some Cassandra patchs Thread-Index: Ac5EUqcdodCEp0fYSHC/eGy0TbXi5QGKYKPA Message-ID: <15949_1367861274_5187E81A_15949_1001_1_0ad3c37d-0331-4031-a701-86ace84cab1d@THSONEA01HUB02P.one.grp> References: <17248_1366992279_517AA597_17248_1033_1_ce4f8a34-b9cd-4834-a6de-c8888effff43@THSONEA01HUB05P.one.grp> <4FD66D07-6FBB-40DE-8DB8-279E43DA731D@thelastpickle.com> In-Reply-To: <4FD66D07-6FBB-40DE-8DB8-279E43DA731D@thelastpickle.com> Accept-Language: fr-FR Content-Language: fr-FR X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: fr-FR x-pmwin-version: 3.1.0.0, Antivirus-Engine: 3.42.1, Antivirus-Data: 4.88G Content-Type: multipart/alternative; boundary="_000_0ad3c37d03314031a70186ace84cab1dTHSONEA01HUB02Ponegrp_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_0ad3c37d03314031a70186ace84cab1dTHSONEA01HUB02Ponegrp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable > De : aaron morton [mailto:aaron@thelastpickle.com] > Envoy=E9 : dimanche 28 avril 2013 22:54 > =C0 : user@cassandra.apache.org > Objet : Re: cost estimate about some Cassandra patchs > > > Does anyone know enough of the inner working of Cassandra to tell me ho= w much work is needed to patch Cassandra to enable such communication vecto= rization/batch ? > > Assuming you mean "have the coordinator send multiple row read/write requ= ests in a single message to replicas" > > Pretty sure this has been raised as a ticket before but I cannot find one= now. > > It would be a significant change and I'm not sure how big the benefit is.= To send the messages the coordinator places them in a queue, there is litt= le delay sending. Then it waits on them async. So there may be some saving = on networking but from the coordinators point of view I think the impact is= minimal. > > What is your use case? Use case =3D rows with rowkey like (folder id, file id) And operations read/write multiple rows with same folder id =3D> so, it cou= ld make sense to have a partitioner putting rows with same "folder id" on t= he same replicas. But so far, Cassandra is not able to exploit this locality as batch effect = ends at the coordinator node. So, my question about the cost estimate for patching Cassandra. The closest (or exactly corresponding to my need ?) JIRA entries I have fou= nd so far are: CASSANDRA-166: Support batch inserts for more than one key at once https://issues.apache.org/jira/browse/CASSANDRA-166 =3D> "WON'T FIX" status CASSANDRA-5034: Refactor to introduce Mutation Container in write path https://issues.apache.org/jira/browse/CASSANDRA-5034 =3D> I am not very sure if it's related to my topic Thanks. Dominique > > Cheers > > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com On 27/04/2013, at 4:04 AM, DE VITO Dominique > wrote: Hi, We are created a new partitioner that groups some rows with **different** r= ow keys on the same replicas. But neither the batch_mutate, or the multiget_slice are able to take opport= unity of this partitioner-defined placement to vectorize/batch communicatio= ns between the coordinator and the replicas. Does anyone know enough of the inner working of Cassandra to tell me how mu= ch work is needed to patch Cassandra to enable such communication vectoriza= tion/batch ? Thanks. Regards, Dominique --_000_0ad3c37d03314031a70186ace84cab1dTHSONEA01HUB02Ponegrp_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

> De : aaron morton [mailto:aaron@thelastpickle.com]
> Envoy=E9 : dimanche 28 avril 2013 22:54
> =C0 : user@cassandra.apache.org
> Objet : Re: cost estimate about some Cassandra patchs=

> 

> > Does anyone know enough of the inner working of Cassandra to tell me how much wo= rk is needed to patch Cassandra to enable such communication vectorization/bat= ch ?

> 

 

> Assuming you mean "have the coordinator send multiple row read/write requests in a single message to replicas"

> 

> Pretty sure this has been raised as a ticket before but I cannot find one now. 

> 

> It would be a significant change and I'm not sure how big the benefit is. To send the messages the coordinator places them in a queue, th= ere is little delay sending. Then it waits on them async. So there may be some saving on networking but from the coordinators point of view I think the im= pact is minimal. 

> 

> What is your use case?<= /span>

 

Use case =3D rows with rowkey like (folder id, file id)=

And operations read/write multiple rows with same folder id =3D> so, it could make sense to have a partitioner putting rows with sam= e "folder id" on the same replicas.

 

But so far, Cassandra is not able to exploit this locality a= s batch effect ends at the coordinator node.

 

So, my question about the cost estimate for patching Cassand= ra.

 

The closest (or exactly corresponding to my need ?) JIRA ent= ries I have found so far are:

 

CASSANDRA-166: Support batch inserts for more than one key a= t once

https://issues.apache.org/jira/browse/CASSANDRA-166

=3D> "WON'T FIX" status

 

CASSANDRA-5034: Refactor to introduce Mutation Container in write path

https://issues.apache.org/jira/browse/CASSANDRA-5034

=3D> I am not very sure if it's related to my topic<= /o:p>

 

Thanks.

 

Dominique

 

 

 

> 

> Cheers

> 

> 

> -----------------

> Aaron Morton

> Freelance Cassandra Consultant

> New Zealand

> 

> @aaronmorton

 

On 27/04/2013, at 4:04 AM, DE VITO Dominique <dominique.devito@thalesgro= up.com> wrote:



Hi,

 

We are created a new partitioner that groups some rows with **different** row = keys on the same replicas.

 

But neither the batch_mutate, or the multiget_slice are able to take opportunit= y of this partitioner-defined placement to vectorize/batch communications betwee= n the coordinator and the replicas.

 

Does anyone know enough of the inner working of Cassandra to tell me how much wo= rk is needed to patch Cassandra to enable such communication vectorization/bat= ch ?

 

Thanks.

 

Regards,

Dominique

 

 

 

--_000_0ad3c37d03314031a70186ace84cab1dTHSONEA01HUB02Ponegrp_--