Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C63FC9B5D for ; Thu, 18 Dec 2014 12:09:11 +0000 (UTC) Received: (qmail 42705 invoked by uid 500); 18 Dec 2014 12:09:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 42668 invoked by uid 500); 18 Dec 2014 12:09:08 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 42658 invoked by uid 99); 18 Dec 2014 12:09:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2014 12:09:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rsvihla@datastax.com designates 209.85.213.49 as permitted sender) Received: from [209.85.213.49] (HELO mail-yh0-f49.google.com) (209.85.213.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2014 12:09:04 +0000 Received: by mail-yh0-f49.google.com with SMTP id f10so463694yha.22 for ; Thu, 18 Dec 2014 04:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datastax.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4VAt/z27iTqcoB5lmyMu2YqA6gKuexAV/1uynlJ/4RM=; b=Gfn63NEVDTGsQ7LqKtXrd9XPn9jO1pBt++s+1CX5AlXVN/pieIIwm6Efu56kLCKlv6 ZBymc0/LAjX1tjbMfIuhfP9zRxbNjyEcOqtwUGXBRjBCzzA3OpIp9RYWaIpjOy11+mTY n93sWWnxBx5dC8Y8xru7koRH0oM9a3IsQQ5Ms= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=4VAt/z27iTqcoB5lmyMu2YqA6gKuexAV/1uynlJ/4RM=; b=fxiRvVKXSiohckMm9fSQaRX8iiaEBT71846Gb+reBrEEvv+MxcvXCAK33y9uAxgd4G 4enZvXbioSE7SI5DO+WTcQ687qc0p8cGc/36y35HCSkxtsjM4HHKTPruTa6OroCrRWHN K/ohKJ+wi5hfe46RwXpmO2VFEhWlRMsTJYnMrRCOeYgQ70lYvDUNdcvb9qQH4OA69GSe yu715KDzSGJILkmOy4j7CuQJ1RB5ALuy0FaNbR/zfOQLEgDAY82NXHzEjQEdiVjDYxVB 6nV3S6d5bGhNu9sjiKNKey+tkqOJoQL1yN0HHhU36oa3Qp7AJllhBUa2wJEW7U63UMqQ 2K8w== X-Gm-Message-State: ALoCoQmVjnX6qjzrP4/cEpF3Z81IfASAzkAOQwt+PXMny8zBsrJWHGoD8i+PtQ77UEzn+D3bMPpF MIME-Version: 1.0 X-Received: by 10.236.198.147 with SMTP id v19mr1440127yhn.54.1418904523977; Thu, 18 Dec 2014 04:08:43 -0800 (PST) Received: by 10.170.216.2 with HTTP; Thu, 18 Dec 2014 04:08:43 -0800 (PST) In-Reply-To: References: Date: Thu, 18 Dec 2014 06:08:43 -0600 Message-ID: Subject: Re: Cassandra for Analytics? From: Ryan Svihla To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e0160a3e01f5067050a7c7451 X-Virus-Checked: Checked by ClamAV on apache.org --089e0160a3e01f5067050a7c7451 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I'd argue the higher latency for reads than HBase, I'm not sure of what experience you have with both, and that may have been true at one point, but with Leveled Compaction Strategy and proper JVM tunings I'm not sure how this is true, it would at least be comparable. I've worked with buffer cached configured clusters where the 99th percentile read is sub 400 microseconds. Spark and Cassandra when combined are a common fit and use case for real time analytics and Ooyala has been doing this for some time. They're a number of Youtube videos where they talk about it https://www.youtube.com/watch?v=3DPjZp7K5z7ew On Wed, Dec 17, 2014 at 10:20 PM, Ajay wrote: > > Hi, > > Can Cassandra be used or best fit for Real Time Analytics? I went through > couple of benchmark between Cassandra Vs HBase (most of it was done 3 yea= rs > ago) and it mentioned that Cassandra is designed for intensive writes and > Cassandra has higher latency for reads than HBase. In our case, we will > have writes and reads (but reads will be more say 40% writes and 60% > reads). We are planning to use Spark as the in memory computation engine. > > Thanks > Ajay > --=20 [image: datastax_logo.png] Ryan Svihla Solution Architect [image: twitter.png] [image: linkedin.png] DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world=E2=80=99s most innovative enterpri= ses. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. --089e0160a3e01f5067050a7c7451 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I'd argue the higher latency for reads than HBase, I&#= 39;m not sure of what experience you have with both, and that may have been= true at one point, but with Leveled Compaction Strategy and proper JVM tun= ings I'm not sure how this is true, it would at least be comparable. I&= #39;ve worked with buffer cached configured clusters where the 99th percent= ile read is sub 400 microseconds.

Spark and Cassandra wh= en combined are a common fit and use case for real time analytics and Ooyal= a has been doing this for some time. They're a number of Youtube videos= where they talk about it https://www.youtube.com/watch?v=3DPjZp7K5z7ew

On Wed, Dec 17, 2014 a= t 10:20 PM, Ajay <ajay.garga@gmail.com> wrote:
Hi,

Can Cassandra be used = or best fit for Real Time Analytics? I went through couple of benchmark bet= ween Cassandra Vs HBase (most of it was done 3 years ago) and it mentioned = that Cassandra is designed for intensive writes and Cassandra has higher la= tency for reads than HBase. In our case, we will have writes and reads (but= reads will be more say 40% writes and 60% reads). We are planning to use S= park as the in memory computation engine.

Thanks
Ajay


--

3D"datastax_logo.png"

Solution Architect


3D"twitter.png"<= /span> 3D"l=
<= div dir=3D"ltr">

DataStax is the fastest, most scalable distributed data= base technology, delivering Apache Cassandra to the world=E2=80=99s most in= novative enterprises. Datastax is built to be agile, always-on, and predict= ably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database techno= logy and transactional backbone of choice for the worlds most innovative co= mpanies such as Netflix, Adobe, Intuit, and eBay.


<= /span>
--089e0160a3e01f5067050a7c7451--