Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EFAB6200C16 for ; Thu, 9 Feb 2017 11:29:09 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EE343160B50; Thu, 9 Feb 2017 10:29:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2F92F160B4C for ; Thu, 9 Feb 2017 11:29:08 +0100 (CET) Received: (qmail 92941 invoked by uid 500); 9 Feb 2017 10:29:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92931 invoked by uid 99); 9 Feb 2017 10:29:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2017 10:29:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 166A4C0E97 for ; Thu, 9 Feb 2017 10:29:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.78 X-Spam-Level: * X-Spam-Status: No, score=1.78 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=instaclustr-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id f66BSZ0KJ0b6 for ; Thu, 9 Feb 2017 10:29:02 +0000 (UTC) Received: from mail-qt0-f172.google.com (mail-qt0-f172.google.com [209.85.216.172]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 593035FC5F for ; Thu, 9 Feb 2017 10:29:02 +0000 (UTC) Received: by mail-qt0-f172.google.com with SMTP id v23so192311617qtb.0 for ; Thu, 09 Feb 2017 02:29:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=instaclustr-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=uBL91CRTCdq3KlBGVXJYen9hQxHBdKKZR/r8I4o5iEA=; b=OV11zud8B6xU3AFJVbNLbSexkjC4LmlmmeH8ShMS+ho0D4L7+a00e+pw0hqXaarCCF mM38qLUg8zypfBI/Ri9vJqnDotgkz197laBA5huA4EeaIt2nbw9h/mzhIkLVZUjkWeqK /Fa4WCWeKU30j5uSJyo0CgIUgHqENNgqMo5QFWOwupH+u90LuwiY2WJqYvXTN0hN6CGF aE9rDK5WOEqX0IlcpYfkxfEGIZf6QR8ZQusNeCrcvuLaePKi8io3Dnv3ZbHDCiK+nNxZ az9qIj0K1l2UMxKRCsTlZw4K/oG9MBgpem/3jfDoYybri1wfK0gqzcqK9ZOat2/lnVbe wzMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=uBL91CRTCdq3KlBGVXJYen9hQxHBdKKZR/r8I4o5iEA=; b=SXhKRqQ+Xki1qsgh0N65R+PdcY/BcTgsovcJYcXaDc2juBQz0ZRn7IXo0I37LD/77U EL9fQCtPAlXGjVreJzVoLBbZcr32xRVnp3l8NM7BXBj0cSLjLiYdiJtzT+NwT854OpQv At2bcI+BvguA39k050FgME+NhXEgVJ3e1qsWoHrKVHWCqvWEOe3ALUq6UES+yMXdlMtz 5i85+Vqf7EsMLYNOujIZ1JKJVMwE1VPqAiPD4B21E6NihLkDnFLXffilUvikncziwiFP kBdTh3y3phJR0yFOtZS45MPqxof7D5SqKIpH0JuKedPpLnDzCCl1PqZ3cCskI591zZw5 IlFw== X-Gm-Message-State: AMke39mqft8CGrr8xg2s2eIc3ohpu6gJZBz96X3fEIEFWNIqXv5vpul1zScjc9JC9cLX26e68Gx6VP5I3CeSVsP0 X-Received: by 10.200.45.177 with SMTP id p46mr1919854qta.240.1486636140901; Thu, 09 Feb 2017 02:29:00 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ben Slater Date: Thu, 09 Feb 2017 10:28:50 +0000 Message-ID: Subject: Re: DELETE/SELECT with multi-column PK and IN To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a114796e816eb9a0548167310 archived-at: Thu, 09 Feb 2017 10:29:10 -0000 --001a114796e816eb9a0548167310 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable That=E2=80=99s a very good point from Sylvain that I forgot/missed. That sa= id, we=E2=80=99ve seen plenty of scenarios where overall system throughput is i= mproved through unlogged batches. One of my colleagues did quite a bit of benchmarking on this topic for his talk at last year=E2=80=99s C* summit: http://www.slideshare.net/DataStax/microbatching-highperformance-writes-ada= m-zegelin-instaclustr-cassandra-summit-2016 On Thu, 9 Feb 2017 at 20:52 Benjamin Roth wrote: > Ok got it. > > But it's interesting that this is supported: > DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3)); > > This is technically mostly the same (Token awareness, > coordination/routing, read performance, ...), right? > > 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne : > > This is a statement on multiple partitions and there is really no > optimization the code internally does on that. In fact, I strongly advise > you to not use a batch but rather simply do a for loop client side and se= nd > statement individually. That way, your driver will be able to use proper > token-awareness for each request (while if you send a batch, one > coordinator will be picked up and will have to forward most statement, > doing more network hops at the end of the day). The only case where using= a > batch is indeed legit is if you care about all the statement being atomic= , > but in that case it's a logged batch you want. > > That's btw more or less why we never bothered implementing that: it's > totally doable technically, but it's not really such a good idea > performance wise in practice most of the time, and you can easily work it > around with a batch if you need atomicity. > > Which is not saying it will never be and shouldn't be supported btw, ther= e > is something to be said for the consistency of the CQL language in genera= l. > But it's why no-one took time to do it so far. > > On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth > wrote: > > Yes, thats the workaround - I'll try that. > > Would you agree it would be better for internal optimizations to process > this within a single statement? > > 2017-02-09 10:32 GMT+01:00 Ben Slater : > > Yep, that makes it clear. I think an unlogged batch of prepared statement= s > with one statement per PK tuple would be roughly equivalent? And probably > no more complex to generate in the client? > > On Thu, 9 Feb 2017 at 20:22 Benjamin Roth wrote= : > > Maybe that makes it clear: > > DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1, > 3), (2, 3), (3, 4)); > > If want to delete or select a bunch of records identified by their > multi-partitionkey tuples. > > 2017-02-09 10:18 GMT+01:00 Ben Slater : > > Are you looking this to be equivalent to (PK1=3D1 AND PK2=3D2) or are you > looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else? > > Cheers > Ben > > On Thu, 9 Feb 2017 at 20:09 Benjamin Roth wrote= : > > Hi Guys, > > CQL says this is not allowed: > > DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2)); > > 1. Is there a reason for it? There shouldn't be a performance penalty, it > is a PK lookup, the same thing works with a single pk column > 2. Is there a known workaround for it? > > It would be much of a help to have it for daily business, IMHO it's a > waste of resources to run multiple queries just to fetch a bunch of recor= ds > by a PK. > > Thanks in advance for any reply > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH =C2=B7 www.jaumo.com > Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany > Phone +49 7161 304880-6 <+49%207161%203048806> =C2=B7 Fax +49 7161 304880= -1 > <+49%207161%203048801> > AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer > > -- > =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 > Ben Slater > Chief Product Officer > Instaclustr: Cassandra + Spark - Managed | Consulting | Support > +61 437 929 798 <+61%20437%20929%20798> > > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH =C2=B7 www.jaumo.com > Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany > Phone +49 7161 304880-6 <+49%207161%203048806> =C2=B7 Fax +49 7161 304880= -1 > <+49%207161%203048801> > AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer > > -- > =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 > Ben Slater > Chief Product Officer > Instaclustr: Cassandra + Spark - Managed | Consulting | Support > +61 437 929 798 <+61%20437%20929%20798> > > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH =C2=B7 www.jaumo.com > Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany > Phone +49 7161 304880-6 <+49%207161%203048806> =C2=B7 Fax +49 7161 304880= -1 > <+49%207161%203048801> > AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer > > > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH =C2=B7 www.jaumo.com > Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany > Phone +49 7161 304880-6 <+49%207161%203048806> =C2=B7 Fax +49 7161 304880= -1 > <+49%207161%203048801> > AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer > --=20 =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 Ben Slater Chief Product Officer Instaclustr: Cassandra + Spark - Managed | Consulting | Support +61 437 929 798 --001a114796e816eb9a0548167310 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
That=E2=80=99s a very good point from Sylvain that I forgo= t/missed. That said, we=E2=80=99ve seen plenty of scenarios where overall s= ystem throughput is improved through unlogged batches. One of my colleagues= did quite a bit of benchmarking on this topic for his talk at last year=E2= =80=99s C* summit:=C2=A0http://www.slideshare.net/DataStax/microbatching-highperformance-write= s-adam-zegelin-instaclustr-cassandra-summit-2016

On Thu, 9 Feb 2017 at 20:52 Benjamin Roth &l= t;benjamin.roth@jaumo.com>= ; wrote:
Ok got it.

But it's interesting that this is supported:<= /div>
DELETE/SELE= CT=C2=A0FROM=C2=A0ks.cf=C2=A0WHERE=C2=A0(pk= 1)=C2=A0IN=C2=A0((1), (2), (3));
<= span class=3D"m_241651103597689151gmail-m_3568448570428936565gmail-s2 gmail= _msg" style=3D"font-size:12.8px">
This is technically m= ostly the same (Token awareness, coordination/routing, read performance, ..= .), right?

2017-02-09 10:43 GMT+01= :00 Sylvain Lebresne <sylvain@d= atastax.com>:
This is a statement o= n multiple partitions and there is really no optimization the code internal= ly does on that. In fact, I strongly advise you to not use a batch but rath= er simply do a for loop client side and send statement individually. That w= ay, your driver will be able to use proper token-awareness for each request= (while if you send a batch, one coordinator will be picked up and will hav= e to forward most statement, doing more network hops at the end of the day)= . The only case where using a batch is indeed legit is if you care about al= l the statement being atomic, but in that case it's a logged batch you = want.

That's btw more or less why we never bothered implementing th= at: it's totally doable technically, but it's not really such a goo= d idea performance wise in practice most of the time, and you can easily wo= rk it around with a batch if you need atomicity.=C2=A0

Which is n= ot saying it will never be and shouldn't be supported btw, there is som= ething to be said for the consistency of the CQL language in general. But i= t's why no-one took time to do it so far.


2017-02-09 10:32 GMT+01:00 Ben Slater <ben.slater@instaclustr.com>:
Yep, that makes it clear. I think an unlogged batch of pr= epared statements with one statement per PK tuple would be roughly equivale= nt? And probably no more complex to generate in the client?

On Thu, 9 Feb 2017 at 20= :22 Benjamin Roth <benjamin.roth@jaumo.com> wrote:
Maybe that makes it clear:

DELETE=C2=A0FROM=C2= =A0ks.cf=C2=A0WHERE=C2=A0(partitionkey1,=C2=A0partitionkey2)=C2=A0IN=C2= =A0((1, 2), (1, 3), (2, 3), (3, 4));

If w= ant to delete or select a bunch of records identified by their multi-partit= ionkey tuples.

2017-02-09 10:18 GMT+01:00 Ben Sla= ter <ben.slater@instaclustr.com>:
Are you looking this to be equivalent to (PK1=3D1 AND PK2=3D2) or are yo= u looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?

Cheers
Ben

O= n Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benj= amin.roth@jaumo.com> wrote:
Hi Guys,

DELETE FROM ks.cf WHERE (p= k1, pk2) IN ((1, 2));


1. Is there a reason for it? There shouldn't be a performance penalt= y, it is a PK lookup, the same thing works with a single pk column
2. Is there a known workaround for it?
<= div class=3D"m_241651103597689151m_-3449622012333787542m_-11074663182256391= 82m_8458775556515303793m_-1721782468393461891m_3370736347832402932gmail_msg= m_241651103597689151m_-3449622012333787542m_-1107466318225639182m_84587755= 56515303793gmail_msg gmail_msg">
It would be much of a help to have it for daily= business, IMHO it's a waste of resources to run multiple queries just = to fetch a bunch of records by a PK.
=
Than= ks in advance for any reply

--
Benjamin Roth
Prokuri= st

Jaumo Gmb= H =C2=B7 www.jaumo.com
Wehrstra=C3= =9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany
Phone +49 7161 304880-6 =C2=B7 = Fax +49 7161 304880-1
AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kamm= erer
--
=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support=


<= /div>--
Benjamin Roth
Prokur= ist

Jaumo GmbH =C2=B7 www.jaumo.com
Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6= ppingen =C2=B7 Germany
Phone +49 7161 304880-6 =C2=B7 Fax = +49 7161 304880-1
AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing= Director: Jens Kammerer
--
<= div dir=3D"ltr" class=3D"gmail_msg">=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94
Ben Slater
Chief Product Officer
Insta= clustr: Cassandra + Spark - Managed | Consulting | Support



--=
Benjamin Roth
Prokurist

Jau= mo GmbH =C2=B7 www.jaumo.com
Wehrstra=C3=9Fe 46 =C2= =B7 73035 G=C3=B6ppingen =C2=B7 Germany
Phone +49 7161 304880-6 =C2=B7 Fax +49 7161 304880-1
AG Ulm =C2=B7 HRB 731058 =C2= =B7 Managing Director: Jens Kammerer




--=
Benjamin Roth
Prokurist

Jaumo GmbH =C2=B7 www.jaumo.com
Wehrstra=C3=9Fe 46 =C2=B7 73035 G=C3=B6ppingen =C2=B7 Germany
Phone +49 7161 304880-6 =C2= =B7 Fax +49 7161 304880-1
AG Ulm =C2=B7 HRB 731058 =C2=B7 Managing Director: Jens Kammerer
--
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798
--001a114796e816eb9a0548167310--