Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B8B9C1BB for ; Thu, 11 Dec 2014 13:17:08 +0000 (UTC) Received: (qmail 64184 invoked by uid 500); 11 Dec 2014 13:17:03 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 64142 invoked by uid 500); 11 Dec 2014 13:17:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 64132 invoked by uid 99); 11 Dec 2014 13:17:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Dec 2014 13:17:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of woolfel@gmail.com designates 209.85.217.180 as permitted sender) Received: from [209.85.217.180] (HELO mail-lb0-f180.google.com) (209.85.217.180) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Dec 2014 13:16:36 +0000 Received: by mail-lb0-f180.google.com with SMTP id l4so4026486lbv.25 for ; Thu, 11 Dec 2014 05:15:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=tLfwEzqj9Xe7OTyNxE71tA3Yd0w7fgLixwWzde1RLJU=; b=NsP2HlsTMEo5MKW816jPv2QRnh2ywqgm6+QgnEVTI613SGkHCi2vDC/crEB4ivkR0t 3S6G2nKxInzfUHs2iOj+HJQ1ERDu35IaYAPixwGizzNO1lGXFaXTOzZlaZ/zJqizszfi WiMGTT3YGv62kJ8/+PuMwTuKJg8TXgXEd+deHLoUoErz5cV0ynlxQ76koSJ9rDn5JL7C Z2a3p/aEjsMFRiax/v2cfTG1PZPyb8O4EG8HLlptm4uwQQ7I0jf6Ufmn4G1+//jvk4Ou yKIiKu/XU9C8cYmZ8BlOiyjA5t1DPD7hPJgZ7UKmqvnUNywnssxZ5wLnjeFbjlZP5WUn TIEw== MIME-Version: 1.0 X-Received: by 10.112.52.37 with SMTP id q5mr9724643lbo.32.1418303749204; Thu, 11 Dec 2014 05:15:49 -0800 (PST) Received: by 10.25.16.137 with HTTP; Thu, 11 Dec 2014 05:15:49 -0800 (PST) In-Reply-To: References: Date: Thu, 11 Dec 2014 08:15:49 -0500 Message-ID: Subject: Re: Spark SQL Vs CQL performance on Cassandra From: Peter Lin To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=001a11c3f86027b3ab0509f09312 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3f86027b3ab0509f09312 Content-Type: text/plain; charset=UTF-8 Spark is an in-memory architecture, so you're not going to see it go faster than CQL for a simple select from 1 table on a few keys. Where you'll see a benefit is loading lots of data into memory and doing some "report like" query where you join data from multiple tables. On Thu, Dec 11, 2014 at 8:09 AM, Ajay wrote: > Hi, > > To test Spark SQL Vs CQL performance on Cassandra, I did the following: > > 1) Cassandra standalone server (1 server in a cluster) > 2) Spark Master and 1 Worker > Both running in a Thinkpad laptop with 4 cores and 8GB RAM. > 3) Written Spark SQL code using Cassandra-Spark Driver from Cassandra > (JavaApiDemo.java. Run with spark://127.0.0.1:7077 127.0.0.1) > 4) Writen CQL code using Java driver from Cassandra > (CassandraJavaApiDemo.java) > In both the case, I create 1 millions rows and query for 1 > > Observation: > 1) It takes less than 10 milliseconds using CQL (SELECT * FROM users WHERE > name='Anna') > 2) It takes around .6 second using Spark (either SELECT * FROM users WHERE > name='Anna' or javaFunctions(sc).cassandraTable("test", "people", > mapRowTo(Person.class)).where("name=?", "Anna"); > > Please let me know if I am missing something in Spark configuration or > Cassandra-Spark Driver. > > Thanks > Ajay Garga > > > > > --001a11c3f86027b3ab0509f09312 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Spark is an in-memory architecture, so you&= #39;re not going to see it go faster than CQL for a simple select from 1 ta= ble on a few keys. Where you'll see a benefit is loading lots of data i= nto memory and doing some "report like" query where you join data= from multiple tables.

On Thu, Dec 11, 2014 at 8:09 AM, Ajay &l= t;ajay.garga@gmai= l.com> wrote:
Hi,
=

To test Spark SQL Vs CQL performance on Cassandra, I did the= following:

1) Cassandra standalone server (1 server in a clus= ter)
2) Spark Master and 1 Worker
Both running in a Thinkpad l= aptop with 4 cores and 8GB RAM.
3) Written Spark SQL code usi= ng Cassandra-Spark Driver from Cassandra (JavaApiDemo.java. Run with spark:= //127.0.0.1:7077 12= 7.0.0.1)=C2=A0=C2=A0=C2=A0
4) Writen CQL code using Java dri= ver from Cassandra (CassandraJavaApiDemo.java)
In both the ca= se, I create 1 millions rows and query for 1

= Observation:
1) It takes less than 10 milliseconds using CQL = (SELECT * FROM users WHERE name=3D'Anna')
2) It takes= around .6 second using Spark (either SELECT * FROM users WHERE name=3D'= ;Anna' or javaFunctions(sc).cassandraTable("test", "peop= le", mapRowTo(Person.class)).where("name=3D?", "Anna&qu= ot;);

Please let me know if I am missing something in Spa= rk configuration or Cassandra-Spark Driver.

Thanks=
Ajay Garga





--001a11c3f86027b3ab0509f09312--