cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Slater <ben.sla...@instaclustr.com>
Subject Re: DELETE/SELECT with multi-column PK and IN
Date Thu, 09 Feb 2017 10:28:50 GMT
That’s a very good point from Sylvain that I forgot/missed. That said,
we’ve seen plenty of scenarios where overall system throughput is improved
through unlogged batches. One of my colleagues did quite a bit of
benchmarking on this topic for his talk at last year’s C* summit:
http://www.slideshare.net/DataStax/microbatching-highperformance-writes-adam-zegelin-instaclustr-cassandra-summit-2016

On Thu, 9 Feb 2017 at 20:52 Benjamin Roth <benjamin.roth@jaumo.com> wrote:

> Ok got it.
>
> But it's interesting that this is supported:
> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>
> This is technically mostly the same (Token awareness,
> coordination/routing, read performance, ...), right?
>
> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne <sylvain@datastax.com>:
>
> This is a statement on multiple partitions and there is really no
> optimization the code internally does on that. In fact, I strongly advise
> you to not use a batch but rather simply do a for loop client side and send
> statement individually. That way, your driver will be able to use proper
> token-awareness for each request (while if you send a batch, one
> coordinator will be picked up and will have to forward most statement,
> doing more network hops at the end of the day). The only case where using a
> batch is indeed legit is if you care about all the statement being atomic,
> but in that case it's a logged batch you want.
>
> That's btw more or less why we never bothered implementing that: it's
> totally doable technically, but it's not really such a good idea
> performance wise in practice most of the time, and you can easily work it
> around with a batch if you need atomicity.
>
> Which is not saying it will never be and shouldn't be supported btw, there
> is something to be said for the consistency of the CQL language in general.
> But it's why no-one took time to do it so far.
>
> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth <benjamin.roth@jaumo.com>
> wrote:
>
> Yes, thats the workaround - I'll try that.
>
> Would you agree it would be better for internal optimizations to process
> this within a single statement?
>
> 2017-02-09 10:32 GMT+01:00 Ben Slater <ben.slater@instaclustr.com>:
>
> Yep, that makes it clear. I think an unlogged batch of prepared statements
> with one statement per PK tuple would be roughly equivalent? And probably
> no more complex to generate in the client?
>
> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.roth@jaumo.com> wrote:
>
> Maybe that makes it clear:
>
> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
> 3), (2, 3), (3, 4));
>
> If want to delete or select a bunch of records identified by their
> multi-partitionkey tuples.
>
> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.slater@instaclustr.com>:
>
> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.roth@jaumo.com> wrote:
>
> Hi Guys,
>
> CQL says this is not allowed:
>
> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>
> 1. Is there a reason for it? There shouldn't be a performance penalty, it
> is a PK lookup, the same thing works with a single pk column
> 2. Is there a known workaround for it?
>
> It would be much of a help to have it for daily business, IMHO it's a
> waste of resources to run multiple queries just to fetch a bunch of records
> by a PK.
>
> Thanks in advance for any reply
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
> --
> ————————
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
> --
> ————————
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
-- 
————————
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798

Mime
View raw message