Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8AF54109A9 for ; Tue, 11 Mar 2014 16:23:12 +0000 (UTC) Received: (qmail 12599 invoked by uid 500); 11 Mar 2014 16:23:09 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11876 invoked by uid 500); 11 Mar 2014 16:23:08 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11849 invoked by uid 99); 11 Mar 2014 16:23:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar 2014 16:23:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=AC_DIV_BONANZA,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com designates 209.85.212.172 as permitted sender) Received: from [209.85.212.172] (HELO mail-wi0-f172.google.com) (209.85.212.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar 2014 16:23:01 +0000 Received: by mail-wi0-f172.google.com with SMTP id hi5so1148372wib.17 for ; Tue, 11 Mar 2014 09:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=r6Ug0s1agkbIyAJt2eY0b6tZWDK53mqd7XORgzoXwSc=; b=q+8zNXAdv2GwPc7gPbuMGdZCH82h4RiXNDvlQ5RiwvP+nWvdHZf05HjRvR95QCjmX4 8m4g7AISSXQWW/6Z4vV2c0SIVIQco7yvMr/wj4b9IIgvBvk5StDbl8RjjG3sMhk8c5y0 wbiz0d+pHoJRY/AWYxGcGwtRiwgyvZS6IbGZ17h0D7Wy/lU7jofJyckMzg7OWuzlr/Fw zvhLRqq7xTfeZ7vXpo42Tn1GOjOT33RewG+SKWWutSrjVNZcXzCPvbcmlYroJqkhSRpi wvwC7pdPZhs2MFHazDwaLRjfQainP1tCGWONZ+9LPmrBLfNByDBCIMbJygzXRh66gnhN pX1w== MIME-Version: 1.0 X-Received: by 10.180.211.208 with SMTP id ne16mr3841798wic.21.1394554960494; Tue, 11 Mar 2014 09:22:40 -0700 (PDT) Received: by 10.194.220.105 with HTTP; Tue, 11 Mar 2014 09:22:40 -0700 (PDT) In-Reply-To: References: Date: Tue, 11 Mar 2014 12:22:40 -0400 Message-ID: Subject: Re: How expensive are additional keyspaces? From: Edward Capriolo To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=001a11c37c920a442204f45721c5 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c37c920a442204f45721c5 Content-Type: text/plain; charset=ISO-8859-1 So in the 0.6.X days a signature of a get looked something like this: get(String keyspace, ColumnPath cp, String rowkey) Besides changes form string -> ByteBuffer the keyspace was pulled out of the argument. I think the better more flexible way to do this would be: struct GetRequest { 1: optional keyspace, 2: required rowkey 3: optional columnPath } get(GetRequest g) This would put some burden on clients to make builder objects instead of calling methods, but it would make something easier to evolve I think. However it is hard for me to justify making a second copy of each method for this small use case. Otherwise I would take that up. On Tue, Mar 11, 2014 at 12:07 PM, Peter Lin wrote: > > if I have time this summer, I may work on that, since I like having thrift. > > > On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo wrote: > >> This mistake is not a thrift limitation. In 0.6.X you could switch >> keyspaces without calling setKeyspace(String) methods specified the >> keyspace in every operation. This is mirrors the StorageProxy class. In >> 0.7.X setKeyspace() was created and the keyspace was removed from all these >> thrift methods. I really dislike that change personally :) >> >> If someone was so motivated, they could pretty easily (a couple days >> work) add new methods to thrift that do not have this limitation. >> >> >> >> >> On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis wrote: >> >>> That is correct. Another place where the mistakes of Thrift informed >>> our development of the native protocol. >>> >>> On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright >>> wrote: >>> > Does this whole true for the native protocol? I've noticed that you >>> can >>> > create a session object in the datastax driver without specifying a >>> keyspace >>> > and so long as you include the keyspace in all queries instead of just >>> table >>> > name, it works fine. In that case, I assume there's only one >>> connection >>> > pool for all keyspaces. >>> > >>> > From: Edward Capriolo >>> > Reply-To: "user@cassandra.apache.org" >>> > Date: Tuesday, March 11, 2014 at 11:05 AM >>> > To: "user@cassandra.apache.org" >>> > Subject: Re: How expensive are additional keyspaces? >>> > >>> > The biggest expense of them is that you need to be authenticated to a >>> > keyspace to perform and operation. Thus connection pools are bound to >>> > keyspaces. Switching a keyspace is an RPC operation. In the thrift >>> client, >>> > If you have 100 keyspaces you need 100 connection pools that starts to >>> be a >>> > pain very quickly. >>> > >>> > I suggest keeping everything in one keyspace unless you really need >>> > different replication factors and or network replication settings per >>> > keyspace. >>> > >>> > >>> > On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer >>> > wrote: >>> >> >>> >> Hey all - >>> >> >>> >> My company is working on introducing a configuration service system to >>> >> provide cofig data to several of our applications, to be backed by >>> >> Cassandra. We're already using Cassandra for other services, and at >>> >> the moment our pending design just puts all the new tables (9 of them, >>> >> I believe) in one of our pre-existing keyspaces. >>> >> >>> >> I've got a few questions about keyspaces that I'm hoping for input on. >>> >> Some Google hunting didn't turn up obvious answers, at least not for >>> >> recent versions of Cassandra. >>> >> >>> >> 1) What trade offs are being made by using a new keyspace versus >>> >> re-purposing an existing one (that is in active use by another >>> >> application)? Organization is the obvious answer, I'm looking for any >>> >> technical reasons. >>> >> >>> >> 2) Is there any per-keyspace overhead incurred by the cluster? >>> >> >>> >> 3) Does it impact on-disk layout at all for tables to be in a >>> >> different keyspace from others? Is any sort of file fragmentation >>> >> potentially introduced just by doing this in a new keyspace as opposed >>> >> to an exiting one? >>> >> >>> >> 4) Does it add any metadata overhead to the system keyspace? >>> >> >>> >> 5) Why might we *not* want to make a separate keyspace for this? >>> >> >>> >> 6) Does anyone have experience with creating additional keyspaces to >>> >> the point that Cassandra can no longer handle it? Note that we're >>> >> *not* planning to do this, I'm just curious. >>> >> >>> >> Cheers, >>> >> Martin >>> > >>> > >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder, http://www.datastax.com >>> @spyced >>> >> >> > --001a11c37c920a442204f45721c5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
So in th= e 0.6.X days a signature of a get looked something like this:

= get(String keyspace, ColumnPath cp, String rowkey)

Besides cha= nges form string -> ByteBuffer the keyspace was pulled out of the argume= nt.

I think the better more flexible way to do this would be:

=
struct GetRequest {
=A0=A0 1: optional keyspace,
= =A0=A0 2: required rowkey
=A0=A0 3: optional columnPath
}
get(GetRequest g)

This would put some burden on clients to make builder objects ins= tead of calling methods, but it would make something easier to evolve I thi= nk.

However it is hard for me to justify making a second copy = of each method for this small use case. Otherwise I would take that up.




On Tue, Mar 11, 2014 at 12:07 PM, Peter= Lin <woolfel@gmail.com> wrote:

if I have ti= me this summer, I may work on that, since I like having thrift.


On Tue, Mar 11, 2014 at 12:05 PM, Edward= Capriolo <edlinuxguru@gmail.com> wrote:
This mistake is not a = thrift limitation. In 0.6.X you could switch keyspaces without calling setK= eyspace(String) methods specified the keyspace in every operation. This is = mirrors the StorageProxy class. In 0.7.X setKeyspace() was created and the = keyspace was removed from all these thrift methods. I really dislike that c= hange personally :)

If someone was so motivated, they could pretty easily (a cou= ple days work) add new methods to thrift that do not have this limitation. =




On Tue, Mar 11, 2014 at 11:39 AM, Jonathan E= llis <jbellis@gmail.com> wrote:
That is correct. =A0Another place where the = mistakes of Thrift informed
our development of the native protocol.

On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright <kwright@nanigans.com> wrote:
> Does this whole true for the native protocol? =A0I've noticed that= you can
> create a session object in the datastax driver without specifying a ke= yspace
> and so long as you include the keyspace in all queries instead of just= table
> name, it works fine. =A0In that case, I assume there's only one co= nnection
> pool for all keyspaces.
>
> From: Edward Capriolo <edlinuxguru@gmail.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Tuesday, March 11, 2014 at 11:05 AM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: How expensive are additional keyspaces?
>
> The biggest expense of them is that you need to be authenticated to a<= br> > keyspace to perform and operation. Thus connection pools are bound to<= br> > keyspaces. Switching a keyspace is an RPC operation. In the thrift cli= ent,
> If you have 100 keyspaces you need 100 connection pools that starts to= be a
> pain very quickly.
>
> I suggest keeping everything in one keyspace unless you really need > different replication factors and or network replication settings per<= br> > keyspace.
>
>
> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <elreydetodo@gmail.com>
> wrote:
>>
>> Hey all -
>>
>> My company is working on introducing a configuration service syste= m to
>> provide cofig data to several of our applications, to be backed by=
>> Cassandra. We're already using Cassandra for other services, a= nd at
>> the moment our pending design just puts all the new tables (9 of t= hem,
>> I believe) in one of our pre-existing keyspaces.
>>
>> I've got a few questions about keyspaces that I'm hoping f= or input on.
>> Some Google hunting didn't turn up obvious answers, at least n= ot for
>> recent versions of Cassandra.
>>
>> 1) What trade offs are being made by using a new keyspace versus >> re-purposing an existing one (that is in active use by another
>> application)? Organization is the obvious answer, I'm looking = for any
>> technical reasons.
>>
>> 2) Is there any per-keyspace overhead incurred by the cluster?
>>
>> 3) Does it impact on-disk layout at all for tables to be in a
>> different keyspace from others? Is any sort of file fragmentation<= br> >> potentially introduced just by doing this in a new keyspace as opp= osed
>> to an exiting one?
>>
>> 4) Does it add any metadata overhead to the system keyspace?
>>
>> 5) Why might we *not* want to make a separate keyspace for this? >>
>> 6) Does anyone have experience with creating additional keyspaces = to
>> the point that Cassandra can no longer handle it? Note that we'= ;re
>> *not* planning to do this, I'm just curious.
>>
>> Cheers,
>> Martin
>
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://ww= w.datastax.com
@spyced



--001a11c37c920a442204f45721c5--