Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
References: <CAMGwqp4QVax3gN=2FqHjU==Sj0dW=d2ZdQtB1=Vgy3PYA2=EkQ@mail.gmail.com>
 <CACACo5RHzdvDZ0U+nb494YAeQs4YqxjSJ_8KpSCpo=p1srPHTQ@mail.gmail.com>
 <CAKgYGaqQdQgnjTMq=70oHVDGZM02Rh3zdH2Xs2fjSwuCRPU2ag@mail.gmail.com>
 <CAENxBwx=vp5GCoSSj_NtSm9jGoCU6rWVwfW2me8PSCjKh6XrVg@mail.gmail.com>
 <CAFn71VOrRDFRwTuHwaF1spCBzEH2P-x4vxfUq2aXJeft6_zE1w@mail.gmail.com>
 <CAMGwqp5F3CajGSqJR9oQ_3RSj8bhfQvNHeVQueeRQrKGM__Tdg@mail.gmail.com> <CAENxBwx9aP3gNHyeqb2nz_j=9gjkpnRm1LTqezS912fQAxqNOQ@mail.gmail.com>
In-Reply-To: <CAENxBwx9aP3gNHyeqb2nz_j=9gjkpnRm1LTqezS912fQAxqNOQ@mail.gmail.com>
From: Eric Stevens <mightye@gmail.com>
Date: Wed, 22 Feb 2017 23:39:19 +0000
Message-ID: <CAORswtwSe5eh6hJvwRf5w4KkJfhNFELOeFmMy+hZoNz1N5AWDA@mail.gmail.com>
Subject: Re: Pluggable throttling of read and write queries
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=94eb2c0cb01804c2d40549270227
archived-at: Wed, 22 Feb 2017 23:39:46 -0000

--94eb2c0cb01804c2d40549270227
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

> We=E2=80=99ve actually had several customers where we=E2=80=99ve done the=
 opposite -
split large clusters apart to separate uses cases

We do something similar but for a single application.  We're functionally
sharding data to different clusters from a single application.  We can have
different server classes for different types of workloads, we can grow and
size clusters accordingly, and we also do things like time sharding so that
we can let at-rest data go to cheaper storage options.

I agree with the general sentiment here that (at least as it stands today)
a monolithic cluster for many applications does not compete to
per-application clusters unless cost is no issue.  At our scale, the
terabytes of C* data we take in per day means that even very small cost
savings really add up at scale.  And even where cost is no issue, the
additional isolation and workload tailoring is still highly valuable.

On Wed, Feb 22, 2017 at 12:01 PM Edward Capriolo <edlinuxguru@gmail.com>
wrote:

>
>
> On Wed, Feb 22, 2017 at 1:20 PM, Abhishek Verma <verma@uber.com> wrote:
>
> We have lots of dedicated Cassandra clusters for large use cases, but we
> have a long tail of (~100) of internal customers who want to store < 200G=
B
> of data with < 5k qps and non-critical data. It does not make sense to
> create a 3 node dedicated cluster for each of these small use cases. So w=
e
> have a shared cluster into which we onboard these users.
>
> But once in a while, one of the customers will run a ingest job from HDFS
> which will pound the shared cluster and break our SLA for the cluster for
> all the other customers. Currently, I don't see anyway to signal back
> pressure to the ingestion jobs or throttle their requests. Another exampl=
e
> is one customer doing a large number of range queries which has the same
> effect.
>
> A simple way to avoid this is to throttle the read or write requests base=
d
> on some quota limits for each keyspace or user.
>
> Please see replies inlined:
>
> On Mon, Feb 20, 2017 at 11:46 PM, vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>
> Aren't you using mesos Cassandra framework to manage your multiple
> clusters ? (Seen a presentation in cass summit)
>
> Yes we are using https://github.com/mesosphere/dcos-cassandra-service and
> contribute heavily to it. I am aware of the presentation (
> https://www.youtube.com/watch?v=3D4Ap-1VT2ChU) at the Cassandra summit as=
 I
> was the one who gave it :)
> This has helped us automate the creation and management of these clusters=
.
>
> What's wrong with your current mesos approach ?
>
> Hardware efficiency: Spinning up dedicated clusters for each use case
> wastes a lot of hardware resources. One of the approaches we have taken i=
s
> spinning up multiple Cassandra nodes belonging to different clusters on t=
he
> same physical machine. However, we still have overhead of managing these
> separate multi-tenant clusters.
>
> I am also thinking it's better to split a large cluster into smallers
> except if you also manage client layer that query cass and you can put so=
me
> backpressure or rate limit in it.
>
> We have an internal storage API layer that some of the clients use, but
> there are many customers who use the vanilla DataStax Java or Python
> driver. Implementing throttling in each of those clients does not seem li=
ke
> a viable approach.
>
> Le 21 f=C3=A9vr. 2017 2:46 AM, "Edward Capriolo" <edlinuxguru@gmail.com> =
a
> =C3=A9crit :
>
> Older versions had a request scheduler api.
>
> I am not aware of the history behind it. Can you please point me to the
> JIRA tickets and/or why it was removed?
>
> On Monday, February 20, 2017, Ben Slater <ben.slater@instaclustr.com>
> wrote:
>
> We=E2=80=99ve actually had several customers where we=E2=80=99ve done the=
 opposite - split
> large clusters apart to separate uses cases. We found that this allowed u=
s
> to better align hardware with use case requirements (for example using AW=
S
> c3.2xlarge for very hot data at low latency, m4.xlarge for more general
> purpose data) we can also tune JVM settings, etc to meet those uses cases=
.
>
> There have been several instances where we have moved customers out of th=
e
> shared cluster to their own dedicated clusters because they outgrew our
> limitations. But I don't think it makes sense to move all the small use
> cases into their separate clusters.
>
> On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin <
> oleksandr.shulgin@zalando.de> wrote:
>
> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma <verma@uber.com> wrote:
>
> Cassandra is being used on a large scale at Uber. We usually create
> dedicated clusters for each of our internal use cases, however that is
> difficult to scale and manage.
>
> We are investigating the approach of using a single shared cluster with
> 100s of nodes and handle 10s to 100s of different use cases for different
> products in the same cluster. We can define different keyspaces for each =
of
> them, but that does not help in case of noisy neighbors.
>
> Does anybody in the community have similar large shared clusters and/or
> face noisy neighbor issues?
>
>
> Hi,
>
> We've never tried this approach and given my limited experience I would
> find this a terrible idea from the perspective of maintenance (remember t=
he
> old saying about basket and eggs?)
>
> What if you have a limited number of baskets and several eggs which are
> not critical if they break rarely.
>
>
> What potential benefits do you see?
>
> The main benefit of sharing a single cluster among several small use case=
s
> is increasing the hardware efficiency and decreasing the management
> overhead of a large number of clusters.
>
> Thanks everyone for your replies and questions.
>
> -Abhishek.
>
>
> I agree with these assertions. On one hand I think about a "managed
> service" like say Amazon DynamoDB. They likely start with very/very/very
> large footprints. IE they commission huge clusters of the fastest SSD
> hardware. Next every application/user has a quota. They always can contro=
l
> the basic load because they control the quota.
>
> Control on the hardware level makes sense, but then your unit of
> management is "a cluster" . Users do not have a unified API anymore, they
> have switch statements, this data in cluster x, this data cluster y. You
> still end up in cases where degenerate usage patterns affect others.
>
> With Cassandra it would be nice if these controls were build into the API=
.
> This could also help you build your own charge back model in the
> enterprise. Sure as someone pointed out rejecting reads stinks for that
> user. But then again someone has to decide who and how pays for the
> hardware.
>
> For example, imagine a company with 19 business units all using the same
> Cassandra cluster. One business unit might account for 90% of the storage=
,
> but 1% of the requests. Another business unit might be 95% of the request=
s,
> but 1% the data. How do you come up with a billing model? For the custome=
r
> with 95% of the requests their "cost" on the systems is young generation
> GC, network.
>
> Datastax enterprise had/has a concept of  "the analytic dc". The concept
> is "real time goes here" and "analytic goes there" with the right resourc=
e
> controls you could get much more fine grained then that. It will never be
> perfect there will always be that random abuser with the "aggregate allow
> filtering query" but there are ways to move in a more managed direction.
>

--94eb2c0cb01804c2d40549270227
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">&gt;=C2=A0<span style=3D"color:rgb(33,33,33)">We=E2=80=99v=
e actually had several customers where we=E2=80=99ve done the opposite - sp=
lit large clusters apart to separate uses cases</span><div><span style=3D"c=
olor:rgb(33,33,33)"><br></span></div><div><span style=3D"color:rgb(33,33,33=
)">We do something similar but for a single application.=C2=A0 We&#39;re fu=
nctionally sharding data to different clusters from a single application.=
=C2=A0 We can have different server classes for different types of workload=
s, we can grow and size clusters accordingly, and we also do things like ti=
me sharding so that we can let at-rest data go to cheaper storage options.<=
/span></div><div><span style=3D"color:rgb(33,33,33)"><br></span></div><div>=
<span style=3D"color:rgb(33,33,33)">I agree with the general sentiment here=
 that (at least as it stands today) a monolithic cluster for many applicati=
ons does not compete to per-application clusters unless cost is no issue.=
=C2=A0 At our scale, the terabytes of C* data we take in per day means that=
 even very small cost savings really add up at scale.=C2=A0 And even where =
cost is no issue, the additional isolation and workload tailoring is still =
highly valuable.</span></div></div><br><div class=3D"gmail_quote"><div dir=
=3D"ltr">On Wed, Feb 22, 2017 at 12:01 PM Edward Capriolo &lt;<a href=3D"ma=
ilto:edlinuxguru@gmail.com">edlinuxguru@gmail.com</a>&gt; wrote:<br></div><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg"><br clas=
s=3D"gmail_msg"><div class=3D"gmail_extra gmail_msg"><br class=3D"gmail_msg=
"><div class=3D"gmail_quote gmail_msg">On Wed, Feb 22, 2017 at 1:20 PM, Abh=
ishek Verma <span dir=3D"ltr" class=3D"gmail_msg">&lt;<a href=3D"mailto:ver=
ma@uber.com" class=3D"gmail_msg" target=3D"_blank">verma@uber.com</a>&gt;</=
span> wrote:<br class=3D"gmail_msg"><blockquote class=3D"gmail_quote gmail_=
msg" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex=
"><div dir=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail_msg">We have lot=
s of dedicated Cassandra clusters for large use cases, but we have a long t=
ail of (~100) of internal customers who want to store &lt; 200GB of data wi=
th &lt; 5k qps and non-critical data. It does not make sense to create a 3 =
node dedicated cluster for each of these small use cases. So we have a shar=
ed cluster into which we onboard these users.<br class=3D"gmail_msg"></div>=
<div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_=
msg">But once in a while, one of the customers will run a ingest job from H=
DFS which will pound the shared cluster and break our SLA for the cluster f=
or all the other customers. Currently, I don&#39;t see anyway to signal bac=
k pressure to the ingestion jobs or throttle their requests. Another exampl=
e is one customer doing a large number of range queries which has the same =
effect.</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div cl=
ass=3D"gmail_msg">A simple way to avoid this is to throttle the read or wri=
te requests based on some quota limits for each keyspace or user.</div><div=
 class=3D"gmail_msg"><div class=3D"gmail_extra gmail_msg"><br class=3D"gmai=
l_msg"></div><div class=3D"gmail_extra gmail_msg">Please see replies inline=
d:</div><div class=3D"gmail_extra gmail_msg"><br class=3D"gmail_msg"><div c=
lass=3D"gmail_quote gmail_msg">On Mon, Feb 20, 2017 at 11:46 PM, vincent gr=
omakowski <span dir=3D"ltr" class=3D"gmail_msg">&lt;<a href=3D"mailto:vince=
nt.gromakowski@gmail.com" class=3D"gmail_msg" target=3D"_blank">vincent.gro=
makowski@gmail.com</a>&gt;</span> wrote:<br class=3D"gmail_msg"><blockquote=
 class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;border-l=
eft:1px solid rgb(204,204,204);padding-left:1ex"><p dir=3D"ltr" class=3D"gm=
ail_msg">Aren&#39;t you using mesos Cassandra framework to manage your mult=
iple clusters ? (Seen a presentation in cass summit)<br class=3D"gmail_msg"=
></p></blockquote><div class=3D"gmail_msg">Yes we are using=C2=A0<a href=3D=
"https://github.com/mesosphere/dcos-cassandra-service" class=3D"gmail_msg" =
target=3D"_blank">https://github.com/mesosphere/dcos-cassandra-service</a> =
and contribute heavily to it. I am aware of the presentation (<a href=3D"ht=
tps://www.youtube.com/watch?v=3D4Ap-1VT2ChU" class=3D"gmail_msg" target=3D"=
_blank">https://www.youtube.com/watch?v=3D4Ap-1VT2ChU</a>) at the Cassandra=
 summit as I was the one who gave it :)</div><div class=3D"gmail_msg">This =
has helped us automate the creation and management of these clusters.</div>=
<blockquote class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><p dir=3D"ltr" =
class=3D"gmail_msg">
What&#39;s wrong with your current mesos approach ?<br class=3D"gmail_msg">=
</p></blockquote><div class=3D"gmail_msg">Hardware efficiency: Spinning up =
dedicated clusters for each use case wastes a lot of hardware resources. On=
e of the approaches we have taken is spinning up multiple Cassandra nodes b=
elonging to different clusters on the same physical machine. However, we st=
ill have overhead of managing these separate multi-tenant clusters.</div><b=
lockquote class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex"><p dir=3D"ltr" cl=
ass=3D"gmail_msg">
I am also thinking it&#39;s better to split a large cluster into smallers e=
xcept if you also manage client layer that query cass and you can put some =
backpressure or rate limit in it.</p></blockquote><div class=3D"gmail_msg">=
We have an internal storage API layer that some of the clients use, but the=
re are many customers who use the vanilla DataStax Java or Python driver. I=
mplementing throttling in each of those clients does not seem like a viable=
 approach.</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><blo=
ckquote class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;b=
order-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=3D"m_184=
7385329446280336m_7685574855862481818gmail-HOEnZb gmail_msg"><div class=3D"=
m_1847385329446280336m_7685574855862481818gmail-h5 gmail_msg">
<div class=3D"gmail_extra gmail_msg"><div class=3D"gmail_quote gmail_msg">L=
e=C2=A021 f=C3=A9vr. 2017 2:46 AM, &quot;Edward Capriolo&quot; &lt;<a href=
=3D"mailto:edlinuxguru@gmail.com" class=3D"gmail_msg" target=3D"_blank">edl=
inuxguru@gmail.com</a>&gt; a =C3=A9crit=C2=A0:<br type=3D"attribution" clas=
s=3D"gmail_msg"><blockquote class=3D"gmail_quote gmail_msg" style=3D"margin=
:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"=
>Older versions had a request scheduler api.</blockquote></div></div></div>=
</div></blockquote><div class=3D"gmail_msg">I am not aware of the history b=
ehind it. Can you please point me to the JIRA tickets and/or why it was rem=
oved?=C2=A0</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><bl=
ockquote class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;=
border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=3D"m_18=
47385329446280336m_7685574855862481818gmail-HOEnZb gmail_msg"><div class=3D=
"m_1847385329446280336m_7685574855862481818gmail-h5 gmail_msg"><div class=
=3D"gmail_extra gmail_msg"><div class=3D"gmail_quote gmail_msg"><blockquote=
 class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;border-l=
eft:1px solid rgb(204,204,204);padding-left:1ex"><div class=3D"gmail_msg">O=
n Monday, February 20, 2017, Ben Slater &lt;<a class=3D"gmail_msg">ben.slat=
er@instaclustr.com</a>&gt; wrote:<br class=3D"gmail_msg"><blockquote class=
=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;border-left:1p=
x solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_=
msg">We=E2=80=99ve actually had several customers where we=E2=80=99ve done =
the opposite - split large clusters apart to separate uses cases. We found =
that this allowed us to better align hardware with use case requirements (f=
or example using AWS c3.2xlarge for very hot data at low latency, m4.xlarge=
 for more general purpose data) we can also tune JVM settings, etc to meet =
those uses cases.</div></blockquote></div></blockquote></div></div></div></=
div></blockquote><div class=3D"gmail_msg">There have been several instances=
 where we have moved customers out of the shared cluster to their own dedic=
ated clusters because they outgrew our limitations. But I don&#39;t think i=
t makes sense to move all the small use cases into their separate clusters.=
</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><blockquote cl=
ass=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;border-left=
:1px solid rgb(204,204,204);padding-left:1ex"><div class=3D"m_1847385329446=
280336m_7685574855862481818gmail-HOEnZb gmail_msg"><div class=3D"m_18473853=
29446280336m_7685574855862481818gmail-h5 gmail_msg"><div class=3D"gmail_ext=
ra gmail_msg"><div class=3D"gmail_quote gmail_msg"><blockquote class=3D"gma=
il_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex"><div class=3D"gmail_msg"><blockquote cl=
ass=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;border-left=
:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr" class=3D"gma=
il_msg"><div class=3D"gmail_msg">On Mon, 20 Feb 2017 at 22:21 Oleksandr Shu=
lgin &lt;<a class=3D"gmail_msg">oleksandr.shulgin@zalando.de</a>&gt; wrote:=
<br class=3D"gmail_msg"></div></div><div class=3D"gmail_quote gmail_msg"><b=
lockquote class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr" =
class=3D"gmail_msg"><div class=3D"gmail_extra gmail_msg"><div class=3D"gmai=
l_quote gmail_msg">On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma <span di=
r=3D"ltr" class=3D"gmail_msg">&lt;<a class=3D"gmail_msg">verma@uber.com</a>=
&gt;</span> wrote:<br class=3D"gmail_msg"><blockquote class=3D"gmail_quote =
gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,=
204,204);padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg">Cassandra i=
s being used on a large scale at Uber. We usually create dedicated clusters=
 for each of our internal use cases, however that is difficult to scale and=
 manage.<div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=
=3D"gmail_msg">We are investigating the approach of using a single shared c=
luster with 100s of nodes and handle 10s to 100s of different use cases for=
 different products in the same cluster. We can define different keyspaces =
for each of them, but that does not help in case of noisy neighbors.=C2=A0<=
/div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"g=
mail_msg">Does anybody in the community have similar large shared clusters =
and/or face noisy neighbor issues?</div></div></blockquote><div class=3D"gm=
ail_msg"><br class=3D"gmail_msg"></div></div></div></div><div dir=3D"ltr" c=
lass=3D"gmail_msg"><div class=3D"gmail_extra gmail_msg"><div class=3D"gmail=
_quote gmail_msg"><div class=3D"gmail_msg">Hi,</div><div class=3D"gmail_msg=
"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">We&#39;ve never tr=
ied this approach and given my limited experience I would find this a terri=
ble idea from the perspective of maintenance (remember the old saying about=
 basket and eggs?)</div></div></div></div></blockquote></div></blockquote><=
/div></blockquote></div></div></div></div></blockquote><div class=3D"gmail_=
msg">What if you have a limited number of baskets and several eggs which ar=
e not critical if they break rarely.</div><div class=3D"gmail_msg">=C2=A0</=
div><blockquote class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px=
 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=
=3D"m_1847385329446280336m_7685574855862481818gmail-HOEnZb gmail_msg"><div =
class=3D"m_1847385329446280336m_7685574855862481818gmail-h5 gmail_msg"><div=
 class=3D"gmail_extra gmail_msg"><div class=3D"gmail_quote gmail_msg"><bloc=
kquote class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=3D"gmail_=
msg"><blockquote class=3D"gmail_quote gmail_msg" style=3D"margin:0px 0px 0p=
x 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=
=3D"gmail_quote gmail_msg"><blockquote class=3D"gmail_quote gmail_msg" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddin=
g-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail_extra =
gmail_msg"><div class=3D"gmail_quote gmail_msg"><div class=3D"gmail_msg">Wh=
at potential benefits do you see?<br class=3D"gmail_msg"></div></div></div>=
</div></blockquote></div></blockquote></div></blockquote></div></div></div>=
</div></blockquote><div class=3D"gmail_msg">The main benefit of sharing a s=
ingle cluster among several small use cases is increasing the hardware effi=
ciency and decreasing the management overhead of a large number of clusters=
.</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D=
"gmail_msg">Thanks everyone for your replies and questions.<span class=3D"m=
_1847385329446280336HOEnZb gmail_msg"><font color=3D"#888888" class=3D"gmai=
l_msg"><br class=3D"gmail_msg"></font></span></div><span class=3D"m_1847385=
329446280336HOEnZb gmail_msg"><font color=3D"#888888" class=3D"gmail_msg"><=
div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_m=
sg">-Abhishek.</div></font></span></div></div></div></div>
</blockquote></div><br class=3D"gmail_msg"></div></div><div dir=3D"ltr" cla=
ss=3D"gmail_msg"><div class=3D"gmail_extra gmail_msg">I agree with these as=
sertions. On one hand I think about a &quot;managed service&quot; like say =
Amazon DynamoDB. They likely start with very/very/very large footprints. IE=
 they commission huge clusters of the fastest SSD hardware. Next every appl=
ication/user has a quota. They always can control the basic load because th=
ey control the quota.=C2=A0<br class=3D"gmail_msg"></div><div class=3D"gmai=
l_extra gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_extra =
gmail_msg">Control on the hardware level makes sense, but then your unit of=
 management is &quot;a cluster&quot; . Users do not have a unified API anym=
ore, they have switch statements, this data in cluster x, this data cluster=
 y. You still end up in cases where degenerate usage patterns affect others=
.<br class=3D"gmail_msg"></div><div class=3D"gmail_extra gmail_msg"><br cla=
ss=3D"gmail_msg"></div><div class=3D"gmail_extra gmail_msg">With Cassandra =
it would be nice if these controls were build into the API. This could also=
 help you build your own charge back model in the enterprise. Sure as someo=
ne pointed out rejecting reads stinks for that user. But then again someone=
 has to decide who and how pays for the hardware.=C2=A0</div><div class=3D"=
gmail_extra gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_ex=
tra gmail_msg">For example, imagine a company with 19 business units all us=
ing the same Cassandra cluster. One business unit might account for 90% of =
the storage, but 1% of the requests. Another business unit might be 95% of =
the requests, but 1% the data. How do you come up with a billing model? For=
 the customer with 95% of the requests their &quot;cost&quot; on the system=
s is young generation GC, network.=C2=A0</div><div class=3D"gmail_extra gma=
il_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_extra gmail_msg">=
Datastax enterprise had/has a concept of =C2=A0&quot;the analytic dc&quot;.=
 The concept is &quot;real time goes here&quot; and &quot;analytic goes the=
re&quot; with the right resource controls you could get much more fine grai=
ned then that. It will never be perfect there will always be that random ab=
user with the &quot;aggregate allow filtering query&quot; but there are way=
s to move in a more managed direction.</div></div>
</blockquote></div>

--94eb2c0cb01804c2d40549270227--