Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CD72B11B3F for ; Sat, 3 May 2014 12:59:15 +0000 (UTC) Received: (qmail 71099 invoked by uid 500); 3 May 2014 12:59:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 70649 invoked by uid 500); 3 May 2014 12:59:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 70641 invoked by uid 99); 3 May 2014 12:59:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 May 2014 12:59:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of SRS0=Q5qS+f=2B=basetechnology.com=jack@yourhostingaccount.com designates 65.254.253.124 as permitted sender) Received: from [65.254.253.124] (HELO mailout15.yourhostingaccount.com) (65.254.253.124) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 May 2014 12:59:06 +0000 Received: from mailscan02.yourhostingaccount.com ([10.1.15.2] helo=mailscan02.yourhostingaccount.com) by mailout15.yourhostingaccount.com with esmtp (Exim) id 1WgZWc-0006um-CE for user@cassandra.apache.org; Sat, 03 May 2014 08:58:42 -0400 Received: from impout02.yourhostingaccount.com ([10.1.55.2] helo=impout02.yourhostingaccount.com) by mailscan02.yourhostingaccount.com with esmtp (Exim) id 1WgZWc-0003lH-Nh for user@cassandra.apache.org; Sat, 03 May 2014 08:58:42 -0400 Received: from walauthsmtp06.yourhostingaccount.com ([10.1.18.6]) by impout02.yourhostingaccount.com with NO UCE id xQyi1n00307rVmq01QyijM; Sat, 03 May 2014 08:58:42 -0400 X-Authority-Analysis: v=2.0 cv=aPZyWMBm c=1 sm=1 a=UkMH5KcvGpXfM81wB0t8ug==:17 a=aQzbgH187woA:10 a=FQVFTJw6TZsA:10 a=3jZET7lWBKwA:10 a=jvYhGVW7AAAA:8 a=5GG0SCeFAAAA:8 a=mV9VRH-2AAAA:8 a=pGLkceISAAAA:8 a=VElWjiew--A5T6l0RU4A:9 a=pILNOxqGKmIA:10 a=MSl-tDqOz04A:10 a=W1FNlOF5AAAA:8 a=LLY6-aQpvqj6BTr47ewA:9 a=_W_S_7VecoQA:10 a=tXsnliwV7b4A:10 a=EF7_sDvxyNsA:10 a=fsJ80uyv01TBpq5N3xWxAg==:117 X-EN-OrigOutIP: 10.1.18.6 X-EN-IMPSID: xQyi1n00307rVmq01QyijM Received: from 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.113.28]:10968 helo=JackKrupansky14) by walauthsmtp06.yourhostingaccount.com with esmtpa (Exim) id 1WgZWc-0008Ht-17 for user@cassandra.apache.org; Sat, 03 May 2014 08:58:42 -0400 Message-ID: <9BE994511CA6454187EAEDFE134A199A@JackKrupansky14> From: "Jack Krupansky" To: References: <409ABD23-17CE-487E-B93B-346EFCA43182@jonhaddad.com> In-Reply-To: <409ABD23-17CE-487E-B93B-346EFCA43182@jonhaddad.com> Subject: Re: Cassandra vs Elasticsearch. Date: Sat, 3 May 2014 08:58:41 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_2A78_01CF66AD.E1B69B50" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:931c98230c6409dcc37fa7e93b490c27 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.113.28 X-EN-OrigHost: 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. ------=_NextPart_000_2A78_01CF66AD.E1B69B50 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable DataStax Enterprise integrates Cassandra and Apache Solr, with Solr as a = secondary index so that the Solr query index can be kept in sync with = the Cassandra data automatically and even fully reindexed if your index = mapping changes, as a single request. So, C* provides the fully = distributed, durable data store, and embedded Solr provides = full-featured rich query, including faceting, sorting, grouping, and = full keyword text and wildcard and fuzzy and range queries. See: http://www.datastax.com/what-we-offer/products-services/datastax-enterpri= se Elasticsearch and Solr are both based on Lucene for the core underlying = indexing and query layers. -- Jack Krupansky From: Jon Haddad=20 Sent: Saturday, May 3, 2014 4:03 AM To: user@cassandra.apache.org=20 Subject: Re: Cassandra vs Elasticsearch. Agreed w/ ES not being the durable data store. I would recommend = treating it as ephemeral, and using Cassandra as your source of truth. = Keep in mind if you change your ES index mapping, you=92ll require a = full reindex in order to search the data properly. It=92s not like = adding a secondary index w/ a DB, where it=92ll go back and take care of = it for you.=20 Jon On May 3, 2014, at 12:31 AM, DuyHai Doan wrote: Hello Tim You're absolutely right about ES for the query part. This is the = perfect fit for complex queries. Now regarding your question: "What advantages does Cassandra give me over ES?" --> linear = scalability & durability. ES is just a super index cluster. I've talked = to ES guys. If they do not sell ES right now as a "database for complex = search" it's because there is no strong guarantee about durability for = your data. Many people just live with it and it's fine. Also, if you = store the original data and just pump it into ES it's also fine. On Sat, May 3, 2014 at 9:14 AM, Tim Uckun wrote: Hey all. I have been trying out some data stores for time series data and = Cassandra was the first on my list because so many people are using it = for the same purpose. I have read many articles on how to model my time = series data and tried several variations of schemas which I thought made = sense for my data but I have really struggled to run some complex = queries I need to run. This has led me down a kind of a rabbit hole of = trying to create various "materialized views" and shotgunning the data = into multiple tables which might be able to run my queries. In the mean time I also took the same data and pumped it into = Elasticsearch and was able to run almost all the queries I needed = without doing anything fancy. Just put the data in, and run your query. = The new aggregations in ES are pretty slick although they don't seem to = be 100% accurate compared to running the same query in Postgres. My question is this. What advantages does Cassandra give me over = ES? Does it compact the data better? Is it faster to query once your = data sizes are huge? Does it use less bandwidth? Is it easier to = administer?=20 I know there must be very compelling reasons to use C* because so = many companies are depending on it for their bread and butter so I'd = love to hear your take. Thanks. ------=_NextPart_000_2A78_01CF66AD.E1B69B50 Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
DataStax Enterprise integrates Cassandra and Apache Solr, with Solr = as a=20 secondary index so that the Solr query index can be kept in sync with = the=20 Cassandra data automatically and even fully reindexed if your index = mapping=20 changes, as a single request. So, C* provides the fully distributed, = durable=20 data store, and embedded Solr provides full-featured rich query, = including=20 faceting, sorting, grouping, and full keyword text and wildcard and = fuzzy and=20 range queries.
 
See:
http://www.datastax.com/what-we-offer/products-services/datas= tax-enterprise
 
Elasticsearch and Solr are both based on Lucene for the core = underlying=20 indexing and query layers.
 
-- Jack=20 Krupansky
 
From: Jon Haddad
Sent: Saturday, May 3, 2014 4:03 AM
Subject: Re: Cassandra vs Elasticsearch.
 
Agreed=20 w/ ES not being the durable data store.  I would recommend treating = it as=20 ephemeral, and using Cassandra as your source of truth.  Keep in = mind if=20 you change your ES index mapping, you=92ll require a full reindex in = order to=20 search the data properly.  It=92s not like adding a secondary index = w/ a DB,=20 where it=92ll go back and take care of it for you.=20
 
Jon
 
On May 3, 2014, at 12:31 AM, DuyHai Doan <doanduyhai@gmail.com> = wrote:
Hello Tim

You're absolutely right about ES for the = query=20 part. This is the perfect fit for complex queries. Now regarding your=20 question:

"What advantages does Cassandra give me over ES?" = -->=20 linear scalability & durability. ES is just a super index cluster. = I've=20 talked to ES guys. If they do not sell ES right now as a "database for = complex=20 search" it's because there is no strong guarantee about durability for = your=20 data. Many people just live with it and it's fine. Also, if you store = the=20 original data and just pump it into ES it's also fine.

 


On Sat, May 3, 2014 at 9:14 AM, Tim Uckun = <timuckun@gmail.com> wrote:
Hey all.

I have been trying out some data stores = for time=20 series data and Cassandra was the first on my list because so many = people=20 are using it for the same purpose.  I have read many articles = on how to=20 model my time series data and tried several variations of schemas = which I=20 thought made sense for my data but I have really struggled to run = some=20 complex queries I need to run.  This has led me down a kind of = a rabbit=20 hole of trying to create various "materialized views" and = shotgunning the=20 data into multiple tables which might be able to run my=20 queries.

In the mean time I also took the same data and = pumped=20 it into Elasticsearch and was able to run almost all the queries I = needed=20 without doing anything fancy. Just put the data in, and run your = query. The=20 new aggregations in ES are pretty slick although they don't seem to = be 100%=20 accurate compared to running the same query in = Postgres.

My=20 question is this.  What advantages does Cassandra give me over=20 ES?  Does it compact the data better? Is it faster to query = once your=20 data sizes are huge? Does it use less bandwidth? Is it easier to = administer?=20

I know there must be very compelling reasons to use C* = because=20 so many companies are depending on it for their bread and butter so = I'd love=20 to hear your take.

Thanks.
 
 
------=_NextPart_000_2A78_01CF66AD.E1B69B50--