From user-return-36294-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Sep 2 18:36:31 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E64A3108EC for ; Mon, 2 Sep 2013 18:36:31 +0000 (UTC) Received: (qmail 93559 invoked by uid 500); 2 Sep 2013 18:36:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 93304 invoked by uid 500); 2 Sep 2013 18:36:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 93296 invoked by uid 99); 2 Sep 2013 18:36:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Sep 2013 18:36:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of fsobral@igcorp.com.br designates 209.85.213.47 as permitted sender) Received: from [209.85.213.47] (HELO mail-yh0-f47.google.com) (209.85.213.47) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Sep 2013 18:36:21 +0000 Received: by mail-yh0-f47.google.com with SMTP id 29so1241978yhl.6 for ; Mon, 02 Sep 2013 11:36:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=rZEfly2SMgTofZ3mE22tOFj/hmaDwus4fo8yN6OkKW8=; b=kvBBGLpIHz34qWsylcMoyyn9XYEICRWt943EcZGFqLlVdEdBASPRonJEZ8y28v6EOd WgWyd5x8joHc3OL5+9O6X3Kci5wsr+tNuPbncrulSyytiscQ1daTeQkO0FG/YRT22yy8 Kjz7hoTq5HKDDwt/FMrvwe69bQdkJPeZbgXnhHNxCMf11sDeOlyV05i1bFOIYFNX4LPS EXbZfjIcRUnGjGSOCwiOTmCMBC8bOZyS6XqBBNDodZ5OJSQrePpjjerZae7z+wj1aq+T Tox74aU/GgQY4TgkOxMyXpgaRTgd4YpDe072X65Eq9LJ45orGb7WTqlYaHaX3wZQ1s9F LDCg== X-Gm-Message-State: ALoCoQmCTqqLpxl7GgO54NUnAlOq/E0JPVJ6l7IsiiIl0wF5qGe6+WVs2ns1LohbIPWameU0pM8C X-Received: by 10.236.100.144 with SMTP id z16mr23713887yhf.9.1378146960309; Mon, 02 Sep 2013 11:36:00 -0700 (PDT) Received: from [10.150.1.176] ([200.232.113.107]) by mx.google.com with ESMTPSA id v96sm17487612yhp.3.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 02 Sep 2013 11:35:59 -0700 (PDT) From: Francisco Nogueira Calmon Sobral Content-Type: multipart/alternative; boundary="Apple-Mail=_DB46409A-B636-4160-9C27-91618FF17391" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: How to perform range queries efficiently? Date: Mon, 2 Sep 2013 15:37:55 -0300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_DB46409A-B636-4160-9C27-91618FF17391 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 We had some problems when using secondary indexes because of three = issues: - The query is a Range Query, which means that it is slow. - There is an open bug regarding the use of row cache for secondary = indexes (CASSANDRA-4973) - The cardinality of our secondary key was very low (this was bad) We performed some modifications and created another column family, which = maps the secondary index to the key of the original column family. The = improvements were very impressive in our case! Best regards Francisco On Aug 28, 2013, at 12:22 PM, Vivek Mishra = wrote: > Create a column family of compositeType (or PRIMARY KEY) as = (user_id,age, salary). >=20 > Then you will be able to query use eq operator over partition key and = as well over clustering key: >=20 > You may also exclude salary as a secondary index rather than part of = cluster key(e.g. age,salary) >=20 > I am sure based on your query usage, you can opt for either a = composite key or may mix composite key with secondary index ! >=20 > Have a look at: = http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1 >=20 > Hope it helps. >=20 >=20 > -Vivek >=20 >=20 > On Wed, Aug 28, 2013 at 5:49 PM, S=E1vio Teles = wrote: > I can populate again. We are modelling the data yet! Tks. >=20 >=20 > 2013/8/28 Vivek Mishra > Just saw that you already have data populated, so i guess modifying = for composite key may not work for you. >=20 > -Vivek >=20 >=20 > On Tue, Aug 27, 2013 at 11:55 PM, S=E1vio Teles = wrote: > Vivek, using a composite key, how would be the query? >=20 >=20 > 2013/8/27 Vivek Mishra > For such queries, looks like you may create a composite key as = (user_id,age, salary). >=20 > Too much indexing always kills(irrespective of RDBMS or NoSQL). = Remember every search request on secondary indexes will be passed on = each node in ring. >=20 > -Vivek >=20 >=20 > On Tue, Aug 27, 2013 at 11:11 PM, S=E1vio Teles = wrote: > Use a database that is designed for efficient range queries? ;D >=20 > Is there no way to do this with Cassandra? Like using Hive, Sorl... >=20 >=20 > 2013/8/27 Robert Coli > On Fri, Aug 23, 2013 at 5:53 AM, S=E1vio Teles = wrote: > I need to perform range query efficiently.=20 > ...=20 > This query takes a long time to run. Any ideas to perform it = efficiently? >=20 > Use a database that is designed for efficient range queries? ;D >=20 > =3DRob > =20 >=20 >=20 >=20 > --=20 > Atenciosamente, > S=E1vio S. Teles de Oliveira > voice: +55 62 9136 6996 > http://br.linkedin.com/in/savioteles > Mestrando em Ci=EAncias da Computa=E7=E3o - UFG=20 > Arquiteto de Software > Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG >=20 >=20 >=20 >=20 > --=20 > Atenciosamente, > S=E1vio S. Teles de Oliveira > voice: +55 62 9136 6996 > http://br.linkedin.com/in/savioteles > Mestrando em Ci=EAncias da Computa=E7=E3o - UFG=20 > Arquiteto de Software > Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG >=20 >=20 >=20 >=20 > --=20 > Atenciosamente, > S=E1vio S. Teles de Oliveira > voice: +55 62 9136 6996 > http://br.linkedin.com/in/savioteles > Mestrando em Ci=EAncias da Computa=E7=E3o - UFG=20 > Arquiteto de Software > Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG >=20 --Apple-Mail=_DB46409A-B636-4160-9C27-91618FF17391 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 mishra.vivs@gmail.com> = wrote:
Create a column family of compositeType = (or PRIMARY KEY) as (user_id,age, salary).

Then you = will be able to query use eq operator  over partition key and as = well over clustering key:

You may also = exclude salary as a secondary index rather than part of cluster key(e.g. = age,salary)

I am = sure based on your query usage, you can opt for either a composite key = or may mix composite key with secondary index !


Hope it = helps.


-Vivek


On Wed, Aug 28, = 2013 at 5:49 PM, S=E1vio Teles <savio.teles@lupa.inf.ufg.br> wrote:
I can = populate again. We are modelling the data yet! Tks.


2013/8/28 Vivek Mishra <mishra.vivs@gmail.com>
Just = saw that you already have data populated, so i guess modifying for = composite key may not work for you.

-Vivek


On Tue, Aug 27, 2013 at 11:55 PM, S=E1vio Teles <savio.teles@lupa.inf.ufg.br> = wrote:
Vivek, using a composite key, how would be the = query?


2013/8/27 Vivek Mishra <mishra.vivs@gmail.com>
For such queries, looks like you may create a composite = key as (user_id,age, salary).

Too much indexing = always kills(irrespective of RDBMS or NoSQL). Remember every search = request on secondary indexes will be passed on each node in ring.

-Vivek


On Tue, Aug 27, 2013 at 11:11 PM, S=E1vio Teles = <savio.teles@lupa.inf.ufg.br> wrote:
Use a database that is designed for efficient range queries? = ;D

Is there no way to do this with Cassandra? Like using Hive, = Sorl...


2013/8/27 Robert Coli <rcoli@eventbrite.com>
On= Fri, Aug 23, 2013 at 5:53 AM, S=E1vio Teles <savio.teles@lupa.inf.ufg.br> wrote:
I = need to perform range query = efficiently. 
... 
This query takes a long time = to run. Any ideas to perform it efficiently?

Use a = database that is designed for efficient range queries? = ;D

=3DRob
 



--
Atenciosamente,
S=E1vio S. Teles de = Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ci=EAncias da Computa=E7=E3o - UFG
Arquiteto de = Software
Laboratory for Ubiquitous and Pervasive = Applications (LUPA) - UFG




--
Atenciosamente,
S=E1vio S. Teles de = Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ci=EAncias da Computa=E7=E3o - UFG
Arquiteto de = Software
Laboratory for Ubiquitous and Pervasive = Applications (LUPA) - UFG




--
Atenciosamente,
S=E1vio S. Teles de = Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ci=EAncias da Computa=E7=E3o - UFG
Arquiteto de = Software
Laboratory for Ubiquitous and Pervasive = Applications (LUPA) - UFG


= --Apple-Mail=_DB46409A-B636-4160-9C27-91618FF17391--