cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Tools to manage repairs
Date Fri, 28 Oct 2016 16:32:47 GMT
On Fri, Oct 28, 2016 at 11:21 AM, Vincent Rischmann <me@vrischmann.me>
wrote:

> Doesn't paging help with this ? Also if we select a range via the cluster
> key we're never really selecting the full partition. Or is that wrong ?
>
>
> On Fri, Oct 28, 2016, at 05:00 PM, Edward Capriolo wrote:
>
> Big partitions are an anti-pattern here is why:
>
> First Cassandra is not an analytic datastore. Sure it has some UDFs and
> aggregate UDFs, but the true purpose of the data store is to satisfy point
> reads. Operations have strict timeouts:
>
> # How long the coordinator should wait for read operations to complete
> read_request_timeout_in_ms: 5000
>
> # How long the coordinator should wait for seq or index scans to complete
> range_request_timeout_in_ms: 10000
>
> This means you need to be able to satisfy the operation in 5 seconds.
> Which is not only the "think time" for 1 server, but if you are doing a
> quorum the operation has to complete and compare on 2 or more servers.
> Beyond these cutoffs are thread pools which fill up and start dropping
> requests once full.
>
> Something has to give, either functionality or physics. Particularly the
> physics of aggregating an ever-growing data set across N replicas in less
> than 5 seconds.  How many 2ms point reads will be blocked by 50 ms queries
> etc.
>
> I do not see the technical limitations of big partitions on disk is the
> only hurdle to climb here.
>
>
> On Fri, Oct 28, 2016 at 10:39 AM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
> Hi Eric,
>
> that would be https://issues.apache.org/jira/browse/CASSANDRA-9754 by
> Michael Kjellman and https://issues.apache.org/jira/browse/CASSANDRA-11206 by
> Robert Stupp.
> If you haven't seen it yet, Robert's summit talk on big partitions is
> totally worth it :
> Video : https://www.youtube.com/watch?v=N3mGxgnUiRY
> Slides : http://www.slideshare.net/DataStax/myths-of-big-partitions
> -robert-stupp-datastax-cassandra-summit-2016
>
> Cheers,
>
>
> On Fri, Oct 28, 2016 at 4:09 PM Eric Evans <john.eric.evans@gmail.com>
> wrote:
>
> On Thu, Oct 27, 2016 at 4:13 PM, Alexander Dejanovski
> <alex@thelastpickle.com> wrote:
> > A few patches are pushing the limits of partition sizes so we may soon be
> > more comfortable with big partitions.
>
> You don't happen to have Jira links to these handy, do you?
>
>
> --
> Eric Evans
> john.eric.evans@gmail.com
>
>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
"Doesn't paging help with this ? Also if we select a range via the cluster
key we're never really selecting the full partition. Or is that wrong ?"

What I am suggestion is that the data store has had this practical
limitation on size of partition since inception. As a result the common use
case is not to use it in such a way. For example, the compaction manager
may not be optimized for this cases, queries running across large
partitions may cause more contention or lots of young gen garbage , queries
running across large partitions may occupy the slots of the read stage etc.


http://mail-archives.apache.org/mod_mbox/cassandra-user/201602.mbox/%3CCAJjpQyTS2eaCcRBVa=ZmM-hcBX5nF4ovC1enW+SFfGwvngOi7g@mail.gmail.com%3E

I think there is possibly some more "little details" to discover. Not in a
bad thing. I just do not think it you can hand-waive like a specific thing
someone is working on now or paging solves it. If it was that easy it would
be solved by now :)

Mime
View raw message