Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6AE317349 for ; Wed, 22 Apr 2015 12:17:55 +0000 (UTC) Received: (qmail 54030 invoked by uid 500); 22 Apr 2015 12:17:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 53986 invoked by uid 500); 22 Apr 2015 12:17:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 53976 invoked by uid 99); 22 Apr 2015 12:17:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Apr 2015 12:17:50 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,MIME_QP_LONG_LINE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: message received from 54.164.171.186 which is an MX secondary for user@cassandra.apache.org) Received: from [54.164.171.186] (HELO mx1-us-east.apache.org) (54.164.171.186) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Apr 2015 12:17:43 +0000 Received: from mail-qk0-f178.google.com (mail-qk0-f178.google.com [209.85.220.178]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 9E44843C92 for ; Wed, 22 Apr 2015 12:17:23 +0000 (UTC) Received: by qku63 with SMTP id 63so228481448qku.3 for ; Wed, 22 Apr 2015 05:17:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:user-agent:date:subject:from:to:message-id:thread-topic :references:in-reply-to:mime-version:content-type; bh=Iu8ddIikK6pUXysua9XEgYVhegfwY9BeDc54kYg5U4c=; b=fsjWSqhCT4oDjUPiTtUlu4fAsyea8aHLhq48Qfl7TK/aeCiyiSUfnIh4iRZ3xcWIRL 5d/SkQKglQvx1Ru/aGN0A0o96s+DVzAFL7nTZuizMKH89irovGL53LK/fpEY0Vun3o+O HKKISh60/6cC6YtSNW954tsDq/S59x3EI0kMRWZODXBmiTZy5vUqf+DIO+RNlOMZjqcQ FUiaTMFlDpoCYO/swqNOW5t6jZpAU+0VQjypl0F48nuIcGtU99AvqrlbNZ4sPpfgwtV+ qbSSv8sK9uByBWc/GQsHNXyviVbOh9bLEJr8HiePlJ5s+I9CRpyCRP6UvJolUdoFjivI Xpbw== X-Received: by 10.140.38.19 with SMTP id s19mr29169691qgs.14.1429705043203; Wed, 22 Apr 2015 05:17:23 -0700 (PDT) Received: from [10.60.71.81] ([67.132.206.254]) by mx.google.com with ESMTPSA id b7sm1663124qkb.33.2015.04.22.05.17.20 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 22 Apr 2015 05:17:22 -0700 (PDT) Sender: "Brian O'Neill" User-Agent: Microsoft-MacOutlook/14.4.9.150325 Date: Wed, 22 Apr 2015 08:17:18 -0400 Subject: Re: Adhoc querying in Cassandra? From: Brian O'Neill To: Message-ID: Thread-Topic: Adhoc querying in Cassandra? References: In-Reply-To: Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3512535444_5269474" X-Virus-Checked: Checked by ClamAV on apache.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3512535444_5269474 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable Again =8B agreed. They have different usage patterns (C* heavy writes, ES heavy read), I woul= d separate them. SOLR should be sufficient. I believe DSE is a tight integration between SOLR and C*. -brian --- Brian O'Neill=20 Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile =80 @boneill42 This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. =20 From: Ali Akhtar Reply-To: Date: Wednesday, April 22, 2015 at 8:10 AM To: Subject: Re: Adhoc querying in Cassandra? I believe ElasticSearch has better support for scaling horizontally (by adding nodes) than Solr does. Some benchmarks that I've looked at, also sho= w it as performing better under high load. I probably wouldn't run them both on the same node, or you might see low performance as they compete for resources. What type of usage do you expect - mostly read, or mostly write? On Wed, Apr 22, 2015 at 5:06 PM, Matthew Johnson wrote: > Hi Ali, Brian, > =20 > Thanks for the suggestion =AD we have previously used Solr (SolrCloud for > distribution) for a lot of other products, presumably this will do the sa= me > job as ElasticSearch? Or does ElasticSearch have specifically better > integration with Cassandra or better support for aggregate queries? > =20 > Would it be an ok architecture to have a Cassandra node and a Solr/ES ins= tance > on each box, so they scale together? Or is it better to have separate ser= vers > for storage and search? > =20 > Cheers, > Matt > =20 >=20 > From: Brian O'Neill [mailto:boneill42@gmail.com] On Behalf Of Brian O'Nei= ll > Sent: 22 April 2015 12:56 > To: user@cassandra.apache.org > Subject: Re: Adhoc querying in Cassandra? > =20 >=20 > =20 >=20 > +1, I think many organizations (including ours) pair Elastic Search with > Cassandra. >=20 > Use Cassandra as your system of record, then index the data with ES. >=20 > =20 >=20 > -brian >=20 > =20 >=20 > --- > Brian O'Neill=20 > Chief Technology Officer > Health Market Science, a LexisNexis Company > 215.588.6024 Mobile =80 @boneill42 > =20 > This information transmitted in this email message is for the intended > recipient only and may contain confidential and/or privileged material. I= f you > received this email in error and are not the intended recipient, or the p= erson > responsible to deliver it to the intended recipient, please contact the s= ender > at the email above and delete this email and any attachments and destroy = any > copies thereof. Any review, retransmission, dissemination, copying or oth= er > use of, or taking any action in reliance upon, this information by person= s or > entities other than the intended recipient is strictly prohibited. > =20 >=20 > =20 >=20 > From: Ali Akhtar > Reply-To: > Date: Wednesday, April 22, 2015 at 7:52 AM > To: > Subject: Re: Adhoc querying in Cassandra? >=20 > =20 > You might find it better to use elasticsearch for your aggregate queries = and > analytics. Cassandra is more of just a data store. >=20 > On Apr 22, 2015 4:42 PM, "Matthew Johnson" wrot= e: >=20 > Hi all, > =20 > Currently we are setting up a =B3big=B2 data cluster, but we are only going t= o > have a couple of servers to start with but we need to be able to scale ou= t > quickly when usage ramps up. Previously we have used Hadoop/HBase for our= big > data cluster, but since we are starting this one on only two nodes I thin= k > Cassandra will be a much better fit, as Hadoop and HBase really need at l= east > 3 to achieve any sort of resilience (zookeeper quorum etc). > =20 > My question is this: > =20 > I have used Apache Phoenix as a JDBC layer on top of HBase, which allows = me to > issue ad-hoc SQL-style queries. (eg count the number of times users have > clicked on a certain button after clicking a different button in the last= 3 > weeks etc). My understanding is that CQL does not support this style of a= dhoc > aggregate querying out of the box. Is there a recommended way to do count= , > sum, average etc without writing client code (in my case Java) every time= I > want to run one? I have been looking at projects like Drill, Spark etc th= at > could potentially sit on top of Cassandra but without actually setting > everything up and testing them it is difficult to figure out what they wo= uld > give us. > =20 > Does anyone else interactively issue adhoc aggregate queries to Cassandra= , and > if so, what stack do you use? > =20 > Thanks! > Matt > =20 --B_3512535444_5269474 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable
Again — agree= d.

They have different usage patterns (C* heavy wri= tes, ES heavy read), I would separate them.
SOLR should be suffici= ent.  I believe DSE is a tight integration between SOLR and C*.

-brian

---

Brian O'N= eill 

Chief Technology Officer

Health Market&nb= sp;Science, a LexisNexis Compa= ny

215.588.6024 Mobile @boneill42 

This information transmitted in this email= message is for the intended recipient only and may contain confidential and= /or privileged material. If you received this email in error and are not the= intended recipient, or the person responsible to deliver it to the intended= recipient, please contact the sender at the email above and delete this ema= il and any attachments and destroy any copies thereof. Any review, retransmi= ssion, dissemination, copying or other use of, or taking any action in relia= nce upon, this information by persons or entities other than the intended re= cipient is strictly prohibited.

 



I believe ElasticSearch has better supp= ort for scaling horizontally (by adding nodes) than Solr does. Some benchmar= ks that I've looked at, also show it as performing better under high load.
I probably wouldn't run them both on the same node, or you= might see low performance as they compete for resources. 
What type of usage do you expect - mostly read, or mostly write= ?

On Wed, A= pr 22, 2015 at 5:06 PM, Matthew Johnson <matt.johnson@algomi.com> wrote:

Hi Ali, Brian,

 

Thanks for the suggestion – we have previously used Solr (Sol= rCloud for distribution) for a lot of other products, presumably this will d= o the same job as ElasticSearch? Or does ElasticSearch have specifically bet= ter integration with Cassandra or better support for aggregate queries?

 

Would it be an ok architecture to have a Cassandra node and= a Solr/ES instance on each box, so they scale together? Or is it better to = have separate servers for storage and search?

 

Cheers,

Matt

 

From: Brian O'Neill= [mailto:boneill42@gmai= l.com] On Behalf Of Brian O'Neill
Sent: 22 April 2015 1= 2:56
To: user@cassandra.apache.org
Subject: Re: Adhoc querying in Cas= sandra?

 

 

+1, I= think many organizations (including ours) pair Elastic Search with Cassandr= a.

Use Cassandra as your system= of record, then index the data with ES.

 

-brian<= /p>

 

---=

Brian O'Neill 

Chief Technology Officer=

Health Market Sc= ience, = a LexisNexis Company

215.5= 88.6024 Mobile @boneill42 

&nb= sp;

This information transmitted in this email m= essage is for the intended recipient only and may contain confidential and/o= r privileged material. If you received this email in error and are not the i= ntended recipient, or the person responsible to deliver it to the intended r= ecipient, please contact the sender at the email above and delete this email= and any attachments and destroy any copies thereof. Any review, retransmiss= ion, dissemination, copying or other use of, or taking any action in relianc= e upon, this information by persons or entities other than the intended reci= pient is strictly prohibited.

 

 

From: Ali Akhtar <ali.rac200@gmail.com>
Reply= -To: <user= @cassandra.apache.org>
Date: Wednesday, April 22, 2015 at 7= :52 AM
To: <user@cassandra.apache.org>
Subject: Re: Adhoc query= ing in Cassandra?

 =

You might find it better to use elasticsearch for your aggre= gate queries and analytics. Cassandra is more of just a data store.

On Apr 22, 2015 4:42 PM, "Matthew Johnson" &l= t;matt.johnson@algo= mi.com> wrote:

Hi all,

 

Currently = we are setting up a “big” data cluster, but we are only going to= have a couple of servers to start with but we need to be able to scale out = quickly when usage ramps up. Previously we have used Hadoop/HBase for our bi= g data cluster, but since we are starting this one on only two nodes I think= Cassandra will be a much better fit, as Hadoop and HBase really need at lea= st 3 to achieve any sort of resilience (zookeeper quorum etc).

 

My question is this:

 

I have used Apache Phoenix as a JDBC layer on top of H= Base, which allows me to issue ad-hoc SQL-style queries. (eg count the numbe= r of times users have clicked on a certain button after clicking a different= button in the last 3 weeks etc). My understanding is that CQL does not supp= ort this style of adhoc aggregate querying out of the box. Is there a recomm= ended way to do count, sum, average etc without writing client code (in my c= ase Java) every time I want to run one? I have been looking at projects like= Drill, Spark etc that could potentially sit on top of Cassandra but without= actually setting everything up and testing them it is difficult to figure o= ut what they would give us.

 

Doe= s anyone else interactively issue adhoc aggregate queries to Cassandra, and = if so, what stack do you use?

 

T= hanks!

Matt<= /p>

 


--B_3512535444_5269474--