Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com
 designates 209.85.212.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKv2g8dO9qSfZaH=BR=e2_1-MeSAtByEfRX0SRCbcpNTr=GUAA@mail.gmail.com>
References: 
 <CAENxBwwwNcYq5R+KfgmQcq=QWg6i0niGq+P0e4wYSpJc498ggw@mail.gmail.com>
	<CF449EEC.21FDF%kwright@nanigans.com>
	<CALdd-zjK6dir-NekumCQthAnV+-UMWaNWTMz30W46EJTo2qMFw@mail.gmail.com>
	<CAENxBwxq74ORVDAGtpGQYMMS-Aj0XjE9WWA3A30=L7r2c-Dz2Q@mail.gmail.com>
	<CAKv2g8dO9qSfZaH=BR=e2_1-MeSAtByEfRX0SRCbcpNTr=GUAA@mail.gmail.com>
Date: Tue, 11 Mar 2014 12:22:40 -0400
Message-ID: 
 <CAENxBwz8kJXQSoOpFS3pNYpanHqamCAJs4DG+XwaiSxv9Hp=fA@mail.gmail.com>
Subject: Re: How expensive are additional keyspaces?
From: Edward Capriolo <edlinuxguru@gmail.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=001a11c37c920a442204f45721c5

--001a11c37c920a442204f45721c5
Content-Type: text/plain; charset=ISO-8859-1

So in the 0.6.X days a signature of a get looked something like this:

get(String keyspace, ColumnPath cp, String rowkey)

Besides changes form string -> ByteBuffer the keyspace was pulled out of
the argument.

I think the better more flexible way to do this would be:

struct GetRequest {
   1: optional keyspace,
   2: required rowkey
   3: optional columnPath
}

get(GetRequest g)

This would put some burden on clients to make builder objects instead of
calling methods, but it would make something easier to evolve I think.

However it is hard for me to justify making a second copy of each method
for this small use case. Otherwise I would take that up.


On Tue, Mar 11, 2014 at 12:07 PM, Peter Lin <woolfel@gmail.com> wrote:

>
> if I have time this summer, I may work on that, since I like having thrift.
>
>
> On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>
>> This mistake is not a thrift limitation. In 0.6.X you could switch
>> keyspaces without calling setKeyspace(String) methods specified the
>> keyspace in every operation. This is mirrors the StorageProxy class. In
>> 0.7.X setKeyspace() was created and the keyspace was removed from all these
>> thrift methods. I really dislike that change personally :)
>>
>> If someone was so motivated, they could pretty easily (a couple days
>> work) add new methods to thrift that do not have this limitation.
>>
>>
>>
>>
>> On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis <jbellis@gmail.com>wrote:
>>
>>> That is correct.  Another place where the mistakes of Thrift informed
>>> our development of the native protocol.
>>>
>>> On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright <kwright@nanigans.com>
>>> wrote:
>>> > Does this whole true for the native protocol?  I've noticed that you
>>> can
>>> > create a session object in the datastax driver without specifying a
>>> keyspace
>>> > and so long as you include the keyspace in all queries instead of just
>>> table
>>> > name, it works fine.  In that case, I assume there's only one
>>> connection
>>> > pool for all keyspaces.
>>> >
>>> > From: Edward Capriolo <edlinuxguru@gmail.com>
>>> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> > Date: Tuesday, March 11, 2014 at 11:05 AM
>>> > To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> > Subject: Re: How expensive are additional keyspaces?
>>> >
>>> > The biggest expense of them is that you need to be authenticated to a
>>> > keyspace to perform and operation. Thus connection pools are bound to
>>> > keyspaces. Switching a keyspace is an RPC operation. In the thrift
>>> client,
>>> > If you have 100 keyspaces you need 100 connection pools that starts to
>>> be a
>>> > pain very quickly.
>>> >
>>> > I suggest keeping everything in one keyspace unless you really need
>>> > different replication factors and or network replication settings per
>>> > keyspace.
>>> >
>>> >
>>> > On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <elreydetodo@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hey all -
>>> >>
>>> >> My company is working on introducing a configuration service system to
>>> >> provide cofig data to several of our applications, to be backed by
>>> >> Cassandra. We're already using Cassandra for other services, and at
>>> >> the moment our pending design just puts all the new tables (9 of them,
>>> >> I believe) in one of our pre-existing keyspaces.
>>> >>
>>> >> I've got a few questions about keyspaces that I'm hoping for input on.
>>> >> Some Google hunting didn't turn up obvious answers, at least not for
>>> >> recent versions of Cassandra.
>>> >>
>>> >> 1) What trade offs are being made by using a new keyspace versus
>>> >> re-purposing an existing one (that is in active use by another
>>> >> application)? Organization is the obvious answer, I'm looking for any
>>> >> technical reasons.
>>> >>
>>> >> 2) Is there any per-keyspace overhead incurred by the cluster?
>>> >>
>>> >> 3) Does it impact on-disk layout at all for tables to be in a
>>> >> different keyspace from others? Is any sort of file fragmentation
>>> >> potentially introduced just by doing this in a new keyspace as opposed
>>> >> to an exiting one?
>>> >>
>>> >> 4) Does it add any metadata overhead to the system keyspace?
>>> >>
>>> >> 5) Why might we *not* want to make a separate keyspace for this?
>>> >>
>>> >> 6) Does anyone have experience with creating additional keyspaces to
>>> >> the point that Cassandra can no longer handle it? Note that we're
>>> >> *not* planning to do this, I'm just curious.
>>> >>
>>> >> Cheers,
>>> >> Martin
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder, http://www.datastax.com
>>> @spyced
>>>
>>
>>
>

--001a11c37c920a442204f45721c5
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div><div><div><div><div><div>So in th=
e 0.6.X days a signature of a get looked something like this:<br><br></div>=
get(String keyspace, ColumnPath cp, String rowkey)<br><br></div>Besides cha=
nges form string -&gt; ByteBuffer the keyspace was pulled out of the argume=
nt.<br>
<br></div>I think the better more flexible way to do this would be:<br><br>=
</div>struct GetRequest {<br></div>=A0=A0 1: optional keyspace,<br></div>=
=A0=A0 2: required rowkey<br></div>=A0=A0 3: optional columnPath<br>}<br><b=
r></div>get(GetRequest g)<br>
<br></div>This would put some burden on clients to make builder objects ins=
tead of calling methods, but it would make something easier to evolve I thi=
nk.<br><br></div>However it is hard for me to justify making a second copy =
of each method for this small use case. Otherwise I would take that up.<br>
<div><div><div><div><div><div><div><div><div><div><br><br></div></div></div=
></div></div></div></div></div></div></div></div><div class=3D"gmail_extra"=
><br><br><div class=3D"gmail_quote">On Tue, Mar 11, 2014 at 12:07 PM, Peter=
 Lin <span dir=3D"ltr">&lt;<a href=3D"mailto:woolfel@gmail.com" target=3D"_=
blank">woolfel@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><br></div>if I have ti=
me this summer, I may work on that, since I like having thrift.<br></div><d=
iv class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">On Tue, Mar 11, 2014 at 12:05 PM, Edward=
 Capriolo <span dir=3D"ltr">&lt;<a href=3D"mailto:edlinuxguru@gmail.com" ta=
rget=3D"_blank">edlinuxguru@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>This mistake is not a =
thrift limitation. In 0.6.X you could switch keyspaces without calling setK=
eyspace(String) methods specified the keyspace in every operation. This is =
mirrors the StorageProxy class. In 0.7.X setKeyspace() was created and the =
keyspace was removed from all these thrift methods. I really dislike that c=
hange personally :)<br>


<br></div><div>If someone was so motivated, they could pretty easily (a cou=
ple days work) add new methods to thrift that do not have this limitation. =
<br></div><div><div><div><br></div><div><br></div><div><br>
</div><div><div><div class=3D"gmail_extra">
<br><div class=3D"gmail_quote">On Tue, Mar 11, 2014 at 11:39 AM, Jonathan E=
llis <span dir=3D"ltr">&lt;<a href=3D"mailto:jbellis@gmail.com" target=3D"_=
blank">jbellis@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">That is correct. =A0Another place where the =
mistakes of Thrift informed<br>
our development of the native protocol.<br>
<br>
On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright &lt;<a href=3D"mailto:kwrigh=
t@nanigans.com" target=3D"_blank">kwright@nanigans.com</a>&gt; wrote:<br>
&gt; Does this whole true for the native protocol? =A0I&#39;ve noticed that=
 you can<br>
&gt; create a session object in the datastax driver without specifying a ke=
yspace<br>
&gt; and so long as you include the keyspace in all queries instead of just=
 table<br>
&gt; name, it works fine. =A0In that case, I assume there&#39;s only one co=
nnection<br>
&gt; pool for all keyspaces.<br>
&gt;<br>
&gt; From: Edward Capriolo &lt;<a href=3D"mailto:edlinuxguru@gmail.com" tar=
get=3D"_blank">edlinuxguru@gmail.com</a>&gt;<br>
&gt; Reply-To: &quot;<a href=3D"mailto:user@cassandra.apache.org" target=3D=
"_blank">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mailto:user@cas=
sandra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&gt;<br>
&gt; Date: Tuesday, March 11, 2014 at 11:05 AM<br>
&gt; To: &quot;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_blan=
k">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mailto:user@cassandra=
.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&gt;<br>
&gt; Subject: Re: How expensive are additional keyspaces?<br>
&gt;<br>
&gt; The biggest expense of them is that you need to be authenticated to a<=
br>
&gt; keyspace to perform and operation. Thus connection pools are bound to<=
br>
&gt; keyspaces. Switching a keyspace is an RPC operation. In the thrift cli=
ent,<br>
&gt; If you have 100 keyspaces you need 100 connection pools that starts to=
 be a<br>
&gt; pain very quickly.<br>
&gt;<br>
&gt; I suggest keeping everything in one keyspace unless you really need<br=
>
&gt; different replication factors and or network replication settings per<=
br>
&gt; keyspace.<br>
&gt;<br>
&gt;<br>
&gt; On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer &lt;<a href=3D"mailto:e=
lreydetodo@gmail.com" target=3D"_blank">elreydetodo@gmail.com</a>&gt;<br>
&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Hey all -<br>
&gt;&gt;<br>
&gt;&gt; My company is working on introducing a configuration service syste=
m to<br>
&gt;&gt; provide cofig data to several of our applications, to be backed by=
<br>
&gt;&gt; Cassandra. We&#39;re already using Cassandra for other services, a=
nd at<br>
&gt;&gt; the moment our pending design just puts all the new tables (9 of t=
hem,<br>
&gt;&gt; I believe) in one of our pre-existing keyspaces.<br>
&gt;&gt;<br>
&gt;&gt; I&#39;ve got a few questions about keyspaces that I&#39;m hoping f=
or input on.<br>
&gt;&gt; Some Google hunting didn&#39;t turn up obvious answers, at least n=
ot for<br>
&gt;&gt; recent versions of Cassandra.<br>
&gt;&gt;<br>
&gt;&gt; 1) What trade offs are being made by using a new keyspace versus<b=
r>
&gt;&gt; re-purposing an existing one (that is in active use by another<br>
&gt;&gt; application)? Organization is the obvious answer, I&#39;m looking =
for any<br>
&gt;&gt; technical reasons.<br>
&gt;&gt;<br>
&gt;&gt; 2) Is there any per-keyspace overhead incurred by the cluster?<br>
&gt;&gt;<br>
&gt;&gt; 3) Does it impact on-disk layout at all for tables to be in a<br>
&gt;&gt; different keyspace from others? Is any sort of file fragmentation<=
br>
&gt;&gt; potentially introduced just by doing this in a new keyspace as opp=
osed<br>
&gt;&gt; to an exiting one?<br>
&gt;&gt;<br>
&gt;&gt; 4) Does it add any metadata overhead to the system keyspace?<br>
&gt;&gt;<br>
&gt;&gt; 5) Why might we *not* want to make a separate keyspace for this?<b=
r>
&gt;&gt;<br>
&gt;&gt; 6) Does anyone have experience with creating additional keyspaces =
to<br>
&gt;&gt; the point that Cassandra can no longer handle it? Note that we&#39=
;re<br>
&gt;&gt; *not* planning to do this, I&#39;m just curious.<br>
&gt;&gt;<br>
&gt;&gt; Cheers,<br>
&gt;&gt; Martin<br>
&gt;<br>
&gt;<br>
<span><font color=3D"#888888"><br>
<br><span class=3D"HOEnZb"><font color=3D"#888888">
<br>
--<br>
Jonathan Ellis<br>
Project Chair, Apache Cassandra<br>
co-founder, <a href=3D"http://www.datastax.com" target=3D"_blank">http://ww=
w.datastax.com</a><br>
@spyced<br>
</font></span></font></span></blockquote></div><span class=3D"HOEnZb"><font=
 color=3D"#888888"><br></font></span></div></div></div></div></div></div>
</blockquote></div><br></div>
</blockquote></div><br></div>

--001a11c37c920a442204f45721c5--