Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D01011CA3 for ; Sun, 6 Jul 2014 06:04:59 +0000 (UTC) Received: (qmail 2780 invoked by uid 500); 6 Jul 2014 06:04:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 2751 invoked by uid 500); 6 Jul 2014 06:04:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 2741 invoked by uid 99); 6 Jul 2014 06:04:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Jul 2014 06:04:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ssrameez@gmail.com designates 209.85.192.173 as permitted sender) Received: from [209.85.192.173] (HELO mail-pd0-f173.google.com) (209.85.192.173) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Jul 2014 06:04:53 +0000 Received: by mail-pd0-f173.google.com with SMTP id r10so3698534pdi.32 for ; Sat, 05 Jul 2014 23:04:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=94o89rSucCq35nfARIShQnnWMgUdW57Kk9svPGzcoFE=; b=c1pF5bTT4VtIQ/+05c91IS9LgLn9nicQle7E+RHUZZFaycDsykow+dTbKoFJQ6IRD/ zqFmBCk/lNTB6GfLGImYOUieUm9H4vBjbB/2nETPmFg2mDr5T5AB5tPuZz3lhYgUBJBM 5KjLSEuzyHtayuzzRmqvLjDTTSx4n0JcjTP+J+LLGXRCpveyJ6BchGNnMwsMmP6hoUYb dFYH8xwRc239gpACioBzwmegxYk8Fx+TYbwvW2VYxZ71t7cfQlRevVZXa1E+0oYy/aPs RY/qBwk3zxqjJ0YE7pQ0dRLHeNCZLPoFMXiNeh5CuVJ3mEZyxCAwDQ8PmMpj37CzW4lf Rzjw== MIME-Version: 1.0 X-Received: by 10.67.2.34 with SMTP id bl2mr20928292pad.58.1404626668560; Sat, 05 Jul 2014 23:04:28 -0700 (PDT) Received: by 10.70.69.133 with HTTP; Sat, 5 Jul 2014 23:04:26 -0700 (PDT) Received: by 10.70.69.133 with HTTP; Sat, 5 Jul 2014 23:04:26 -0700 (PDT) In-Reply-To: <1404504452643.f915c458@Nodemailer> References: <1404504452643.f915c458@Nodemailer> Date: Sun, 6 Jul 2014 11:34:26 +0530 Message-ID: Subject: Re: How to perform Range Queries in Cassandra From: Rameez Thonnakkal To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b15a49d9f27fb04fd80212f X-Virus-Checked: Checked by ClamAV on apache.org --047d7b15a49d9f27fb04fd80212f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Won't the performeance improve significantly if you increase the number of nodes even in a commodity hardware profile. On 5 Jul 2014 01:38, "Jens Rantil" wrote: > Hi Mike, > > To learn get subsecond performance on your queries using _any_ database > you need to use proper indexing. Like Jeremy said, Solr will do this. > > If you'd like to try to solve this using Cassandra you need to learn the > difference between partition and clustering in your primary key and > understand you need a clustering to do any kind of range query. > > Also, COUNTs in Cassandra are generally fairly slow. > > Cheers, > Jens > =E2=80=94 > Sent from Mailbox > > > On Tue, Jun 24, 2014 at 10:09 AM, Mike Carter wrote= : > >> Hello! >> >> >> I'm a beginner in C* and I'm quite struggling with it. >> >> I=E2=80=99d like to measure the performance of some Cassandra-Range-Quer= ies. The >> idea is to execute multidimensional range-queries on Cassandra. E.g. the= re >> is a given table of 1million rows with 10 columns and I like to execute >> some queries like =E2=80=9Cselect count(*) from testable where d=3D1 and= v1<10 and v2 >> >20 and v3 <45 and v4>70 =E2=80=A6 allow filtering=E2=80=9D. This kind = of queries is very >> slow in C* and soon the tables are bigger, I get a read-timeout probably >> caused by long scan operations. >> >> In further tests I like to extend the dimensions to more than 200 >> hundreds and the rows to 100millions, but actually I can=E2=80=99t handl= e this >> small table. Should reorganize the data or is it impossible to perform s= uch >> high multi-dimensional queries on Cassandra? >> >> >> >> >> >> The setup: >> >> Cassandra is installed on a single node with 2 TB disk space and 180GB >> Ram. >> >> Connected to Test Cluster at localhost:9160. >> >> [cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.= 0] >> >> >> >> Keyspace: >> >> CREATE KEYSPACE test WITH replication =3D { >> >> 'class': 'SimpleStrategy', >> >> 'replication_factor': '1' >> >> }; >> >> >> >> >> >> Table: >> >> CREATE TABLE testc21 ( >> >> key int, >> >> d int, >> >> v1 int, >> >> v10 int, >> >> v2 int, >> >> v3 int, >> >> v4 int, >> >> v5 int, >> >> v6 int, >> >> v7 int, >> >> v8 int, >> >> v9 int, >> >> PRIMARY KEY (key) >> >> ) WITH >> >> bloom_filter_fp_chance=3D0.010000 AND >> >> caching=3D'ROWS_ONLY' AND >> >> comment=3D'' AND >> >> dclocal_read_repair_chance=3D0.000000 AND >> >> gc_grace_seconds=3D864000 AND >> >> index_interval=3D128 AND >> >> read_repair_chance=3D0.100000 AND >> >> replicate_on_write=3D'true' AND >> >> populate_io_cache_on_flush=3D'false' AND >> >> default_time_to_live=3D0 AND >> >> speculative_retry=3D'99.0PERCENTILE' AND >> >> memtable_flush_period_in_ms=3D0 AND >> >> compaction=3D{'class': 'SizeTieredCompactionStrategy'} AND >> >> compression=3D{'sstable_compression': 'LZ4Compressor'}; >> >> >> >> CREATE INDEX testc21_d_idx ON testc21 (d); >> >> >> >> select * from testc21 limit 10; >> >> key | d | v1 | v10 | v2 | v3 | v4 | v5 | v6 | v7 | v8 | v9 >> >> --------+---+----+-----+----+----+-----+----+----+----+----+----- >> >> 302602 | 1 | 56 | 55 | 26 | 45 | 67 | 75 | 25 | 50 | 26 | 54 >> >> 531141 | 1 | 90 | 77 | 86 | 42 | 76 | 91 | 47 | 31 | 77 | 27 >> >> 693077 | 1 | 67 | 71 | 14 | 59 | 100 | 90 | 11 | 15 | 6 | 19 >> >> 4317 | 1 | 70 | 77 | 44 | 77 | 41 | 68 | 33 | 0 | 99 | 14 >> >> 927961 | 1 | 15 | 97 | 95 | 80 | 35 | 36 | 45 | 8 | 11 | 100 >> >> 313395 | 1 | 68 | 62 | 56 | 85 | 14 | 96 | 43 | 6 | 32 | 7 >> >> 368168 | 1 | 3 | 63 | 55 | 32 | 18 | 95 | 67 | 78 | 83 | 52 >> >> 671830 | 1 | 14 | 29 | 28 | 17 | 42 | 42 | 4 | 6 | 61 | 93 >> >> 62693 | 1 | 26 | 48 | 15 | 22 | 73 | 94 | 86 | 4 | 66 | 63 >> >> 488360 | 1 | 8 | 57 | 86 | 31 | 51 | 9 | 40 | 52 | 91 | 45 >> >> Mike >> > > --047d7b15a49d9f27fb04fd80212f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Won't the performeance improve significantly if you incr= ease the number of nodes even in a commodity hardware profile.

On 5 Jul 2014 01:38, "Jens Rantil" <= ;jens.rantil@tink.se> wrote:<= br type=3D"attribution">
Hi Mike,

To learn get subsecond performance on your queries using _any_ databas= e you need to use proper indexing. Like Jeremy said, Solr will do this.

If you'd like to try to solve this using Cassandra you need to lea= rn the difference between partition and clustering in your primary key and = understand you need a clustering to do any kind of range query.

Also, COUNTs in Cassandra are generally fairly slow.

Cheers,
Jens
=E2=80=94
Sent from Mailbox


On Tue, Jun 24, 2014 at 10:09 AM, Mik= e Carter <jalooser2@gmail.com> wrote:

Hello!


I'm a beginner in C* and I'm= quite struggling with it.

I=E2=80=99d like to measure the performance of some Ca= ssandra-Range-Queries. The idea is to execute multidimensional range-querie= s on Cassandra. E.g. there is a given table of 1million rows with 10 column= s and I like to execute some queries like =E2=80=9Cselect count(*) from tes= table where d=3D1 and v1<10 and v2 >20 and v3 <45 and v4>70 =E2= =80=A6 allow filtering=E2=80=9D. =C2=A0This kind of queries is very slow in= C* and soon the tables are bigger, I get a read-timeout probably caused by= long scan operations.

In further tests I like to extend th= e dimensions to more than 200 hundreds and the rows to 100millions, but act= ually I can=E2=80=99t handle this small table. Should reorganize the data o= r is it impossible to perform such high multi-dimensional queries on Cassan= dra?

=C2=A0

=C2=A0

The setup:

Cassandra is installed on a single node with 2 TB disk= space and 180GB Ram.

Connected to Test Cluster at localho= st:9160.

[cqlsh 4.1.1 | Cassandra 2.0.7 | CQL= spec 3.1.1 | Thrift protocol 19.39.0]

=C2=A0

Keyspace:

CREATE KEYSPACE test WITH replication =3D {<= /u>

=C2=A0 'class': 'SimpleS= trategy',

=C2=A0 'replication_factor':= '1'

};

=C2=A0

=C2=A0

Table:

CREATE TABLE testc21 (

=C2=A0 key int,=

=C2=A0 d int,

=C2=A0 v1 int,<= /p>

=C2=A0 v10 int,

=C2=A0 v2 int,<= /p>

=C2=A0 v3 int,

=C2=A0 v4 int,<= /p>

=C2=A0 v5 int,

=C2=A0 v6 int,<= /p>

=C2=A0 v7 int,

=C2=A0 v8 int,<= /p>

=C2=A0 v9 int,

=C2=A0 PRIMARY KEY (key)

) WITH

=C2=A0 bloom_filter_fp_chance=3D0.01= 0000 AND

=C2=A0 caching=3D'ROWS_ONLY' AND=

=C2=A0 comment=3D'' AND

=C2=A0 dclocal_read_repair_chance=3D= 0.000000 AND

=C2=A0 gc_grace_seconds=3D864000 AND

=C2=A0 index_interval=3D128 AND

=C2=A0 read_repair_chance=3D0.100000= AND

=C2=A0 replicate_on_write=3D'true' AND<= u>

=C2=A0 populate_io_cache_on_flush=3D= 'false' AND

=C2=A0 default_time_to_live=3D0 AND<= u>

=C2=A0 speculative_retry=3D'99.0PERCENTILE' AN= D

=C2=A0 memtable_flush_period_in_ms= =3D0 AND

=C2=A0 compaction=3D{'class'= : 'SizeTieredCompactionStrategy'} AND

=C2=A0 compression=3D{'sstable_compression': &= #39;LZ4Compressor'};

=C2=A0

CREATE INDEX testc21_d_idx ON testc2= 1 (d);

=C2=A0

select * from testc21 limit 10;

=

key=C2=A0=C2=A0=C2=A0 | d | v1 | v10= | v2 | v3 | v4=C2=A0 | v5 | v6 | v7 | v8 | v9

--------+---+----+-----+----+----+--= ---+----+----+----+----+-----

=C2=A0302602 | 1 | 56 |=C2=A0 55 | 26 | 45 |=C2=A0 67 = | 75 | 25 | 50 | 26 |=C2=A0 54

=C2=A0531141 | 1 | 90 |=C2=A0 77 | 8= 6 | 42 |=C2=A0 76 | 91 | 47 | 31 | 77 |=C2=A0 27

=C2=A0693077 | 1 | 67 |=C2=A0 71 | 1= 4 | 59 | 100 | 90 | 11 | 15 |=C2=A0 6 |=C2=A0 19

=C2=A0=C2=A0 4317 | 1 | 70 |=C2=A0 77 | 44 | 77 |=C2= =A0 41 | 68 | 33 |=C2=A0 0 | 99 |=C2=A0 14

=C2=A0927961 | 1 | 15 |=C2=A0 97 | 9= 5 | 80 |=C2=A0 35 | 36 | 45 |=C2=A0 8 | 11 | 100

=C2=A0313395 | 1 | 68 |=C2=A0 62 | 5= 6 | 85 |=C2=A0 14 | 96 | 43 |=C2=A0 6 | 32 |=C2=A0=C2=A0 7

=C2=A0368168 | 1 |=C2=A0 3 |=C2=A0 63 | 55 | 32 |=C2= =A0 18 | 95 | 67 | 78 | 83 |=C2=A0 52

=C2=A0671830 | 1 | 14 |=C2=A0 29 | 2= 8 | 17 |=C2=A0 42 | 42 |=C2=A0 4 |=C2=A0 6 | 61 |=C2=A0 93

=C2=A0 62693 | 1 | 26 |=C2=A0 48 | 1= 5 | 22 |=C2=A0 73 | 94 | 86 |=C2=A0 4 | 66 |=C2=A0 63<= /p>

=C2=A0488360 | 1 |=C2=A0 8 |=C2=A0 57 | 86 | 31 |=C2= =A0 51 |=C2=A0 9 | 40 | 52 | 91 |=C2=A0 45

Mike


--047d7b15a49d9f27fb04fd80212f--