Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 03A62C432 for ; Wed, 12 Nov 2014 18:38:44 +0000 (UTC) Received: (qmail 35821 invoked by uid 500); 12 Nov 2014 18:38:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 35780 invoked by uid 500); 12 Nov 2014 18:38:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 35770 invoked by uid 99); 12 Nov 2014 18:38:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2014 18:38:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jonathan.haddad@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qa0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2014 18:38:14 +0000 Received: by mail-qa0-f44.google.com with SMTP id w8so9109146qac.3 for ; Wed, 12 Nov 2014 10:37:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:references:from:date:message-id:subject:to :content-type; bh=z8eYvumU696KSa51EHWNmQGQyxO2dNSH9+9KB6oSLxY=; b=ANNiJVl9+lQe9bEsuVjMzqK5sSSXeUirGTTqUZVlGEN5iF4yEE+t+RVM0IsE5PB0aL ZZmQrmGTE1T19ew5i4Epb2ywQWmnd+yPzoNXu4zsFB48WGc45J17JA5w6n1tOammc9Wm L3E5jCGwNoeJnwEzq1bbHkb5wkAWSWi+bouKDjuxRYvQPZokDw2JmZ/CGY1HOmUU1KHl 0arhEm9zU9O8cJkh7wLjGvoDGHiBNEBgyJP2q8VbO4jxkIQYC+A3/m914TtChn/9L82/ +FDVnhKKCXuUWnuKkyFAFMRGJHmq7dCjZ+UBIrNZ646fmofuquFED2PKY6WiidR5FJa7 Fg7w== X-Received: by 10.224.74.194 with SMTP id v2mr42352616qaj.60.1415817448547; Wed, 12 Nov 2014 10:37:28 -0800 (PST) MIME-Version: 1.0 References: From: Jonathan Haddad Date: Wed, 12 Nov 2014 18:37:27 +0000 Message-ID: Subject: Re: Cassandra sort using updatable query To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e0158b34c166f3d0507adb05b X-Virus-Checked: Checked by ClamAV on apache.org --089e0158b34c166f3d0507adb05b Content-Type: text/plain; charset=UTF-8 With Cassandra you're going to want to model tables to meet the requirements of your queries instead of like a relational database where you build tables in 3NF then optimize after. For your optimized select query, your table (with caveat, see below) could start out as: create table words ( year int, frequency int, content text, primary key (year, frequency, content) ); You may want to maintain other tables as well for different types of select statements. Your UPDATE statement above won't work, you'll have to DELETE and INSERT, since you can't change the value of a clustering column. If you don't know what your old frequency is ahead of time (to do the delete), you'll need to keep another table mapping content,year -> frequency. Now, the tricky part here is that the above model will limit the total number of partitions you've got to the number of years you're working with, and will not scale as the cluster increases in size. Ideally you could bucket frequencies. If that feels like too much work (it's starting to for me), this may be better suited to something like solr, elastic search, or DSE (cassandra + solr). Does that help? Jon On Wed Nov 12 2014 at 9:01:44 AM Chamila Wijayarathna < cdwijayarathna@gmail.com> wrote: > Hello all, > > I have a data set with attributes content and year. I want to put them in > to CF 'words' with attributes ('content','year','frequency'). The CF should > support following operations. > > - Frequency attribute of a column can be updated (i.e. - : can run > query like "UPDATE words SET frequency = 2 WHERE content='abc' AND > year=1990;), where clause should contain content and year > - Should support select query like "Select content from words where > year = 2010 ORDER BY frequency DESC LIMIT 10;" (where clause only has year) > where results can be ordered using frequency > > Is this kind of requirement can be fulfilled using Cassandra? What is the > CF structure and indexing I need to use here? What queries should I use to > create CF and in indexing? > > > Thank You! > > > > -- > *Chamila Dilshan Wijayarathna,* > SMIEEE, SMIESL, > Undergraduate, > Department of Computer Science and Engineering, > University of Moratuwa. > --089e0158b34c166f3d0507adb05b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable With Cassandra you're going to want to model tables to meet the require= ments of your queries instead of like a relational database where you build= tables in 3NF then optimize after.

For your optimized select query,= your table (with caveat, see below) could start out as:

create table words (
=C2=A0 year int,
=C2=A0 frequency= int,
=C2=A0 content text,
=C2=A0 primary key (year, fr= equency, content) );

You may want to maintain othe= r tables as well for different types of select statements. =C2=A0

Your UPDATE statement above won't work, you'll have= to DELETE and INSERT, since you can't change the value of a clustering= column.=C2=A0 If you don't know what your old frequency is ahead of ti= me (to do the delete), you'll need to keep another table mapping conten= t,year -> frequency.

Now, the tricky part here = is that the above model will limit the total number of partitions you'v= e got to the number of years you're working with, and will not scale as= the cluster increases in size.=C2=A0 Ideally you could bucket frequencies.= =C2=A0 If that feels like too much work (it's starting to for me), this= may be better suited to something like solr, elastic search, or DSE (cassa= ndra=C2=A0+ solr).

Does that help?

<= /div>
Jon





On Wed Nov 12 2014 at 9:01:44 AM Cha= mila Wijayarathna <cdwijayar= athna@gmail.com> wrote:
Hello all,

I have a data set with attributes content and year. I w= ant to put them in to CF 'words' with attributes ('content'= ,'year','frequency'). The CF should support following opera= tions.

  • Frequency attribute of a column can be up= dated (i.e. - : can run query like "UPDATE words SET frequency =3D 2 W= HERE content=3D'abc' AND year=3D1990;), where clause should contain= content and year
  • Should support select query like = "Select content from words where year =3D 2010 ORDER BY frequency DESC= LIMIT 10;" (where clause only has year) where results can be ordered = using frequency

Is this kind of requirement can be fulfilled using Cassandra? What is= the CF structure and indexing I need to use here? What queries should I us= e to create CF and in indexing?


Thank You!



--
Cha= mila Dilshan Wijayarathna,
SMIEEE, SMIESL,
Undergraduate,
Depa= rtment of Computer Science and Engineering,
University of Moratuwa.
<= /div>
--089e0158b34c166f3d0507adb05b--