Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16E6A10B9D for ; Fri, 20 Sep 2013 23:42:43 +0000 (UTC) Received: (qmail 2258 invoked by uid 500); 20 Sep 2013 23:42:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 2234 invoked by uid 500); 20 Sep 2013 23:42:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 2226 invoked by uid 99); 20 Sep 2013 23:42:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Sep 2013 23:42:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of leslie.d.hartzman@medtronic.com designates 216.32.180.13 as permitted sender) Received: from [216.32.180.13] (HELO va3outboundpool.messaging.microsoft.com) (216.32.180.13) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Sep 2013 23:42:35 +0000 Received: from mail16-va3-R.bigfish.com (10.7.14.230) by VA3EHSOBE014.bigfish.com (10.7.40.64) with Microsoft SMTP Server id 14.1.225.22; Fri, 20 Sep 2013 23:42:13 +0000 Received: from mail16-va3 (localhost [127.0.0.1]) by mail16-va3-R.bigfish.com (Postfix) with ESMTP id 973483A0131 for ; Fri, 20 Sep 2013 23:42:13 +0000 (UTC) X-Forefront-Antispam-Report: CIP:144.15.242.247;KIP:(null);UIP:(null);IPV:NLI;H:mspinfrasym2.corp.medtronic.com;RD:nd242247.global.medtronic.COM;EFVD:NLI X-SpamScore: -2 X-BigFish: VPS-2(zz98dI9371Ic85fhzz1f42h208ch1ee6h1de0h1fdah2073h1202h1e76h1d1ah1d2ah1fc6hzz1d7338h1de098h17326ah18c673h1de097h186068h1954cbh8275bh8275dhz2fh2a8h839hd25hf0ah1288h12a5h12bdh137ah1441h1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h1b0ah1bceh1d0ch1d2eh1d3fh1dfeh1dffh1fe8h1ff5h20f0h1155h) Received-SPF: pass (mail16-va3: domain of medtronic.com designates 144.15.242.247 as permitted sender) client-ip=144.15.242.247; envelope-from=leslie.d.hartzman@medtronic.com; helo=mspinfrasym2.corp.medtronic.com ;edtronic.com ; Received: from mail16-va3 (localhost.localdomain [127.0.0.1]) by mail16-va3 (MessageSwitch) id 1379720530840343_26865; Fri, 20 Sep 2013 23:42:10 +0000 (UTC) Received: from VA3EHSMHS011.bigfish.com (unknown [10.7.14.236]) by mail16-va3.bigfish.com (Postfix) with ESMTP id C515626004F for ; Fri, 20 Sep 2013 23:42:10 +0000 (UTC) Received: from mspinfrasym2.corp.medtronic.com (144.15.242.247) by VA3EHSMHS011.bigfish.com (10.7.99.21) with Microsoft SMTP Server (TLS) id 14.16.227.3; Fri, 20 Sep 2013 23:42:10 +0000 X-AuditID: 0a30376f-b7fb08e000001a51-a3-523cdd508e56 Received: from mspmail2.medtronic.com ( [10.48.80.44]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mspinfrasym2.corp.medtronic.com (Symantec Mail Security) with SMTP id DC.4E.06737.05DDC325; Fri, 20 Sep 2013 17:42:08 -0600 (CST) Received: from MSPM1BMSGH36.ent.core.medtronic.com (mspm1bmsgh36.ent.core.medtronic.com [10.48.252.32]) by mspmail2.medtronic.com (8.13.8+Sun/8.13.8) with ESMTP id r8KNg8QQ012197 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL) for ; Fri, 20 Sep 2013 18:42:08 -0500 (CDT) Received: from MSPM1BMSGM41.ent.core.medtronic.com ([10.48.252.147]) by MSPM1BMSGH36.ent.core.medtronic.com ([10.48.252.32]) with mapi id 14.03.0146.000; Fri, 20 Sep 2013 18:42:06 -0500 From: "Hartzman, Leslie" To: "user@cassandra.apache.org" Subject: RE: Ad-hoc queries question Thread-Topic: Ad-hoc queries question Thread-Index: Ac62TySw+RhFdoCQS7SrCuKBMFNnJwAMLgAAAAoAvRD//7bagIAAUlZA Date: Fri, 20 Sep 2013 23:41:58 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.48.240.165] Content-Type: multipart/alternative; boundary="_000_EF2FF10C0BAE9E4997F59E1C7E52C145BE9345MSPM1BMSGM41entco_" MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMKsWRmVeSWpSXmKPExsXCZdBarxtw1ybIYOZvXYtz7/4xOzB6fLv2 jT2AMYrLJiU1J7MstUjfLoErY8re1UwFs7Mq/u2+zNrAeCemi5GTQ0LARKJrygRGCFtM4sK9 9WxdjFwcQgKzmCSWt0xlgnD2MEmceXcYytnLKHHo5EV2kBY2ASuJ0/NPsoLYIgLWEgcWPmDp YuTgEBZQkWg/bA0RVpVY96gNLCwi4Cbx+VU1SJgFKHznxBywxbwCERLnTv1mhBi/lUniWONX NpB6ToFAiX2fEkFqGIGO+35qDROIzSwgLnHryXwmiKMFJRbN3sMM88C/XQ/ZIGxFiY6DN1kh 6vMl+nd/YoPYJShxcuYTFogaSYmDK26wTGAUm4Vk7CwkLbOQtEDEdSQWQMWZBbQlli18zQxj nznwmAlZfAEj+ypG+dzigsy8tKLE4spcI73k/KICvdzUlJKi/LzMZCA3dxMjOA7N83cw/run fYhRgINRiYf3zHabICHWxDKgpkOMkhxMSqK8BVeBQnxJ+SmVGYnFGfFFpTmpxYcYJTiYlUR4 790ByvGmJFZWpRblw6SkOViUxHkvRKkHCQmkJ5akZqemFqQWwWRlODiUJHiDQBoFi1LTUyvS MnNKENJMHJwgw3mAhhuBDS8uSMwtzkyHyJ9ilJQS5xUASQiAJDJK8+B6XzGKA70gzJsPkuUB plS4rldAA5mABp5aYAUysCQRISXVwCgVo2Ruf8P9iL9tKv/OD+89Y3V/2hVlq5QrCF/MjbP0 fT6VM1fB7kzPl4MnpTsPv1s0cd+Wkz2WjaZ9QvyuWTKbu0rq39xwXllgtG5JtF/3jYNLzhod SHopGn8sXXmKKc+5o6L9GTz+Va22Ru2b3LRbH8ia7nr5Ye1MrUyNb9truXYuKQixU2Ipzkg0 1GIuKk4EAEIpjk9mAwAA X-OriginatorOrg: medtronic.com X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn% X-Virus-Checked: Checked by ClamAV on apache.org --_000_EF2FF10C0BAE9E4997F59E1C7E52C145BE9345MSPM1BMSGM41entco_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable By ad-hoc queries I mean exactly what you've described. The need to access = data from multiple column families, typically addressed in RDBs with JOINs. I haven't really become familiar enough with MapReduce yet, so I'll have to= delve deeper into that. I'm hoping that the de-normalized nature of things= would obviate the need for complex subquery-type of operations. From: Peter Lin [mailto:woolfel@gmail.com] Sent: Friday, September 20, 2013 4:30 PM To: user@cassandra.apache.org Subject: Re: Ad-hoc queries question What do you mean by ad-hoc queries? Most NoSql databases do not support cross table joins, due to the distribut= ed nature of NoSql databases. If we compare this to partitioned databases i= n the RDB world, cross partition joins is also more expensive than non-part= itioned databases. you can do ad-hoc queries on a single table as long as the columns have sec= ondary indexes defined. You can do multi-table joins using MapReduce or usi= ng CQL handle that logic in your application. In some cases, you can use th= e concept of summary tables to speed up complex multi-table adhoc queries t= hat have nasty joins. One thing that is very hard to do with all NoSql data= bases is complex correlated subqueries. For those kinds of use cases, MapRe= duce is the "preferred" technique. for comparison, databases like Oracle RAC distribute table indexes and perf= orm index joins to speed up complex multi-table joins. The downside is a fu= ll Oracle RAC is very expensive and has a high up front cost. On Fri, Sep 20, 2013 at 7:20 PM, Hartzman, Leslie > wrote: Thanks Rob. I thought that might have been the situation but wasn't sure. S= o does this negate the use of cqlsh to do this then? I'd hate to have to pr= ovide custom code to support ad-hoc queries. Les From: Robert Coli [mailto:rcoli@eventbrite.com= ] Sent: Friday, September 20, 2013 4:06 PM To: user@cassandra.apache.org Subject: Re: Ad-hoc queries question On Fri, Sep 20, 2013 at 3:25 PM, Hartzman, Leslie > wrote: So are ad-hoc queries more awkward or not feasible? Yes. To expand slightly, you will probably end up querying multiple columnfamili= es and doing the ad-hoc JOIN-esque aspect in application code. =3DRob [CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this email = is proprietary to Medtronic and is intended for use only by the individual = or entity to which it is addressed, and may contain information that is pri= vate, privileged, confidential or exempt from disclosure under applicable l= aw. If you are not the intended recipient or it appears that this mail has = been forwarded to you without proper authority, you are notified that any u= se or dissemination of this information in any manner is strictly prohibite= d. In such cases, please delete this mail from your records. To view this n= otice in other languages you can either select the following link or manual= ly copy and paste the link into the address bar of a web browser: http://em= aildisclaimer.medtronic.com --_000_EF2FF10C0BAE9E4997F59E1C7E52C145BE9345MSPM1BMSGM41entco_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

By ad-hoc queries I mean = exactly what you’ve described. The need to access data from multiple = column families, typically addressed in RDBs with JOINs.

 <= /p>

I haven’t really be= come familiar enough with MapReduce yet, so I’ll have to delve deeper= into that. I’m hoping that the de-normalized nature of things would obviate the need for complex subquery-type of operations.

 <= /p>

From: Peter Li= n [mailto:woolfel@gmail.com]
Sent: Friday, September 20, 2013 4:30 PM
To: user@cassandra.apache.org
Subject: Re: Ad-hoc queries question

 

 

What do you mean by a= d-hoc queries?

Most NoSql databases = do not support cross table joins, due to the distributed nature of NoSql da= tabases. If we compare this to partitioned databases in the RDB world, cros= s partition joins is also more expensive than non-partitioned databases.

you can do ad-hoc queries on a single table as long = as the columns have secondary indexes defined. You can do multi-table joins= using MapReduce or using CQL handle that logic in your application. In som= e cases, you can use the concept of summary tables to speed up complex multi-table adhoc queries that have nas= ty joins. One thing that is very hard to do with all NoSql databases is com= plex correlated subqueries. For those kinds of use cases, MapReduce is the = "preferred" technique.

for comparison, databases like Oracle RAC distribute table indexes and perf= orm index joins to speed up complex multi-table joins. The downside is a fu= ll Oracle RAC is very expensive and has a high up front cost.

 

On Fri, Sep 20, 2013 at 7:20 PM, Hartzman, Leslie &l= t;lesl= ie.d.hartzman@medtronic.com> wrote:

Thanks Rob. I thought that might have b= een the situation but wasn’t sure. So does this negate the use of cqlsh to do this then? I’d hate to have to provide custom cod= e to support ad-hoc queries.

 

Les

 

From: Robert Coli [mailto:rcoli@eventbrite.co= m]
Sent: Friday, September 20, 2013 4:06 PM
To: u= ser@cassandra.apache.org
Subject: Re: Ad-hoc queries question

 

On Fri, Sep 20, 2013 at 3:25 PM, Hartzman, Leslie <leslie.d.hartzman@= medtronic.com> wrote:

So are ad-hoc queries more awkward or not feasible?

 

Yes.

 

To expand slightly, you will probably end up querying multiple col= umnfamilies and doing the ad-hoc JOIN-esque aspect in application code.

 

=3DRob

 

[CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this ema= il is proprietary to Medtronic and is intended for use only by the individu= al or entity to which it is addressed, and may contain information that is = private, privileged, confidential or exempt from disclosure under applicable law. If you are not the intende= d recipient or it appears that this mail has been forwarded to you without = proper authority, you are notified that any use or dissemination of this in= formation in any manner is strictly prohibited. In such cases, please delete this mail from your records. To v= iew this notice in other languages you can either select the following link= or manually copy and paste the link into the address bar of a web browser: http://e= maildisclaimer.medtronic.com

 

--_000_EF2FF10C0BAE9E4997F59E1C7E52C145BE9345MSPM1BMSGM41entco_--