Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 56A8811FB7 for ; Sun, 25 May 2014 19:02:16 +0000 (UTC) Received: (qmail 85154 invoked by uid 500); 25 May 2014 19:02:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 85125 invoked by uid 500); 25 May 2014 19:02:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 85117 invoked by uid 99); 25 May 2014 19:02:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 May 2014 19:02:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of colpclark@gmail.com designates 209.85.216.180 as permitted sender) Received: from [209.85.216.180] (HELO mail-qc0-f180.google.com) (209.85.216.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 May 2014 19:02:10 +0000 Received: by mail-qc0-f180.google.com with SMTP id i17so10942166qcy.39 for ; Sun, 25 May 2014 12:01:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:references:from:content-type:in-reply-to:message-id:date:to :content-transfer-encoding:mime-version; bh=78bmAYOlXs30DjvHTD834q5J+2DXRnhfwIoZr2pCVGY=; b=sQcNAOMc2tx7VT9nY0v4MwtBbMjrP01rcyFWg+ZIKk9NOv/MLPVLEe5te6ynbsOr7v 589+ilVz3Y2zrBJ5qNqyMU9UN930w66rRa8UvOLIY37mw76sKG+4zNDE8d4AB8F+NTQb OoxmqvLENx4MxqKD5MhzjkIaufIGUi9NLtn8WKd2/55Jm/X0n06quYtULdufF1+OQXjz BjJZhiEL5LY7tNvHKmMEt1l42dL/KeTcv21cBHhDvzQ4AAfi8o00mdRksJu6bk+bMCvw tVxQcrh1VDh4L5pWkKyd23sd0C9qRqLa4FZ7MxLzdC6Q7xFCGH1BleSLRt0s1PH9arOt tC8Q== X-Received: by 10.140.21.239 with SMTP id 102mr25383672qgl.31.1401044506140; Sun, 25 May 2014 12:01:46 -0700 (PDT) Received: from ?IPv6:2600:1014:b025:a13e:2de7:f97a:d888:fd4b? ([2600:1014:b025:a13e:2de7:f97a:d888:fd4b]) by mx.google.com with ESMTPSA id u77sm6416700qga.46.2014.05.25.12.01.44 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 25 May 2014 12:01:44 -0700 (PDT) Subject: Re: Possible to Add multiple columns in one query ? References: <00d601cf781e$4e958ed0$ebc0ac70$@petrolink.com> <2DA633FC78F54A9B969EE87EDB593C14@JackKrupansky14> From: Colin Content-Type: multipart/alternative; boundary=Apple-Mail-8F2945C0-E3E5-434F-A923-6F0D7C2AEA14 X-Mailer: iPhone Mail (11D201) In-Reply-To: <2DA633FC78F54A9B969EE87EDB593C14@JackKrupansky14> Message-Id: Date: Sun, 25 May 2014 14:01:43 -0500 To: "user@cassandra.apache.org" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-8F2945C0-E3E5-434F-A923-6F0D7C2AEA14 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Try asynch updates, and collect the futures at 1,000 and play around from th= ere. =20 Also, in the real world, you'd want to use load balancing and token aware po= licies when connecting to the cluster. This will actually bypass the coordi= nator and write directly to the correct nodes. I will post a link to my github with an example when I get off the road -- Colin 320-221-9531 > On May 25, 2014, at 1:56 PM, "Jack Krupansky" wr= ote: >=20 > Typo: I presume =E2=80=9Cchannelid=E2=80=9D should be =E2=80=9Ctagid=E2=80= =9D for the partition key for your table. > =20 > Yes, BATCH statements are the way to go, but be careful not to make your b= atches too large, otherwise you could lose performance when Cassandra is rel= atively idle while the batch is slowly streaming in to the coordinator node o= ver the network. Better to break up a large batch into multiple moderate siz= e batches (exact size and number will vary and need testing to deduce) that w= ill transmit quicker and can be executed in parallel. > =20 > I=E2=80=99m not sure Cassandra on a laptop would be the best measure of pe= rformance for a real cluster, especially compared to a server with more CPU c= ores than your laptop. > =20 > And for a real cluster, rows with different partition keys can be sent to a= coordinator node that owns that partition key, which could be multiple node= s for RF>1. > =20 > -- Jack Krupansky > =20 > From: Mark Farnan > Sent: Sunday, May 25, 2014 9:36 AM > To: user@cassandra.apache.org > Subject: Possible to Add multiple columns in one query ? > =20 > I=E2=80=99m sure this is a CQL 101 question, but. =20 > =20 > Is it possible to add MULTIPLE Rows/Columns to a single Partition in a s= ingle CQL 3 Query / Call.=20 > =20 > Need: > I=E2=80=99m trying to find the most efficient way to add multiple time ser= ies events to a table in a single call. > Whilst most time series data comes in sequentially, we have a case where i= t is often loaded in bulk, say sent 100,000 points for 50 channels/tags a= t one go. (sometimes more), and this needs to be loaded as quickly and effi= ciently as possible. > =20 > Fairly standard Time-Series schema (this is for testing purposes only at t= his point, and doesn=E2=80=99t represent final schemas) > =20 > CREATE TABLE tag ( > tagid int, > idx timestamp, > value double, > PRIMARY KEY (channelid, idx) > ) WITH CLUSTERING ORDER BY (idx DESC); > =20 > =20 > Currently I=E2=80=99m using Batch statements, but even that is not fast en= ough. > =20 > Note: At this point I=E2=80=99m testing on a single node cluster on laptop= , to compare different versions. > =20 > We are using DataStax C# 2.0 (beta) client. And Cassandra 2.0.7 > =20 > Regards > Mark. --Apple-Mail-8F2945C0-E3E5-434F-A923-6F0D7C2AEA14 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Try asynch updates, and collect the fu= tures at 1,000 and play around from there.  

A= lso, in the real world, you'd want to use load balancing and token aware pol= icies when connecting to the cluster.  This will actually bypass the co= ordinator and write directly to the correct nodes.

= I will post a link to my github with an example when I get off the road

--
Colin
320-221-9531


On May 25, 2014, at 1:56 PM, "Jack Krupansky" <jack@basetechnology.com> wrote:

=
Typo: I presume =E2=80=9Cchannelid=E2=80=9D should be =E2=80=9Ctagid=E2= =80=9D for the partition key for=20 your table.
 
Yes, BATCH statements are the way to go, but be careful not to make you= r=20 batches too large, otherwise you could lose performance when Cassandra is=20= relatively idle while the batch is slowly streaming in to the coordinator no= de=20 over the network. Better to break up a large batch into multiple moderate si= ze=20 batches (exact size and number will vary and need testing to deduce) that wi= ll=20 transmit quicker and can be executed in parallel.
 
I=E2=80=99m not sure Cassandra on a laptop would be the best measure of= performance=20 for a real cluster, especially compared to a server with more CPU cores than= =20 your laptop.
 
And for a real cluster, rows with different partition keys can be sent t= o a=20 coordinator node that owns that partition key, which could be multiple nodes= for=20 RF>1.
 
-- Ja= ck=20 Krupansky
 
Sent: Sunday, May 25, 2014 9:36 AM
Subject: Possible to Add multiple columns in one query=20 ?
 

I=E2=80=99m sure this is a  CQL 101 question, bu= t.=20

 

Is it possible to add MULTIPLE   Rows/Colum= ns =20 to a single Partition in a single CQL 3  Query / Call. =20

 

Need:

I=E2=80=99m trying to fi= nd the most=20 efficient way to add multiple time series events to a table in a single call= .=20

Whilst most time series d= ata comes=20 in sequentially, we have a case where it is often loaded in bulk,  say=20= sent  100,000 points for 50  channels/tags  at one go. =20= (sometimes more), and this needs to be loaded as quickly and efficiently as=20= possible.

 

Fairly standard Time-Series schema (this is for testi= ng=20 purposes only at this point, and doesn=E2=80=99t represent final schemas)=20=

 

CREATE TABLE=20 tag (

 =20 tagid int,

 =20 idx timestamp,

 =20 value double,=

 =20 PRIMARY KEY=20= (channelid,=20 idx)

) WITH CLUSTERING ORDER BY=20 (idx DESC);

 

 

Currently I=E2=80=99m using Batch statements, but eve= n that is not=20 fast enough.

 

Note: At this point I=E2=80=99m testing on a single n= ode cluster on=20 laptop, to compare different versions.

 

We are using DataStax C# 2.0 (beta) client. And Cassa= ndra=20 2.0.7

 

Regards

Mark.

= --Apple-Mail-8F2945C0-E3E5-434F-A923-6F0D7C2AEA14--