Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 83E6AEF43 for ; Wed, 9 Jan 2013 21:37:55 +0000 (UTC) Received: (qmail 1505 invoked by uid 500); 9 Jan 2013 21:37:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 1491 invoked by uid 500); 9 Jan 2013 21:37:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 1481 invoked by uid 99); 9 Jan 2013 21:37:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jan 2013 21:37:52 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Stephen.M.Thompson@wellsfargo.com designates 151.151.26.137 as permitted sender) Received: from [151.151.26.137] (HELO mxdcmv01i.wellsfargo.com) (151.151.26.137) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jan 2013 21:37:45 +0000 Received: from mxicmv02.wellsfargo.com (mxicmv02.wellsfargo.com [10.91.24.72]) by mxdcmv01i.wellsfargo.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id r09LbMbB028306 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 9 Jan 2013 21:37:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wellsfargo.com; s=2011-05-wfb; t=1357767442; bh=ui6jcrMy/tmo7lfdg4pR9U/zbR0+u7P0edPMt1DwaJE=; h=From:To:Date:Subject:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=CR4TvgRFLyzQ7nrEeuoXRRcFSIQQFoy9NGYzP3QT7SnWsOwNAW4TlF96ZoVcc3c/u bpIFIQJKFqt7sgOoW+7LYzA5hmfICgVbH5j5lcWYSa9YvK8RWAfBh9E06s/OJJ7okj 1jSyOJB1+h6VdheRUHvt9D4TKsINSxJ9ZpQyZ6ZU= Received: from MSGEXSIL4001.ent.wfb.bank.corp (msgexsil4001.wellsfargo.com [170.13.178.17]) by mxicmv02.wellsfargo.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id r09LaetJ010593 for ; Wed, 9 Jan 2013 21:37:22 GMT Received: from MSGCMSIL1003.ent.wfb.bank.corp ([169.254.2.67]) by MSGEXSIL4001.ent.wfb.bank.corp ([170.13.178.17]) with mapi; Wed, 9 Jan 2013 16:37:20 -0500 From: To: Date: Wed, 9 Jan 2013 16:37:19 -0500 Subject: RE: Date Index? Thread-Topic: Date Index? Thread-Index: Ac3unpjMogAfJCCvSPuNYY6po39YQQAEWKOQ Message-ID: <333B362E7B77B344A2D0FD92840282611F7F323249@MSGCMSIL1003.ent.wfb.bank.corp> References: <333B362E7B77B344A2D0FD92840282611F7F28CA3E@MSGCMSIL1003.ent.wfb.bank.corp> <333B362E7B77B344A2D0FD92840282611F7F28D219@MSGCMSIL1003.ent.wfb.bank.corp> <8982CA96-6FCE-49FB-9DE7-B3386D2EFB8C@barracuda.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_333B362E7B77B344A2D0FD92840282611F7F323249MSGCMSIL1003e_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_333B362E7B77B344A2D0FD92840282611F7F323249MSGCMSIL1003e_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable OK ... I think I understand these. So the idea is that you would use the t= ime as the column key? So when I might have something like this: | time=3D2013/01/03 08:19:01 | user=3Djohn | site=3DChicago | time=3D2013/01/05 01:55:34 | user=3Djohn | site=3DChicago | time=3D2013/01/09 16:21:42 | user=3Djohn | site=3DNew York | time=3D2013/01/09 17:27:41 | user=3Dsusan | site=3DBoston | time=3D2013/01/09 17:27:41 | user=3Dasok | site=3DDallas Instead it would be better to do something like this: | 2013/01/03 08:19:01=3D {user=3Djohn, site=3DChicago} | 2013/01/05 = 01:55:34=3D{user=3Djohn, site=3DChicago } | 2013/01/09 16:21:42=3D{user=3Dj= ohn, site=3DNew York} | time=3D2013/01/09 17:27:41 =3D {user=3Dsusan, site=3DBoston} | time=3D2013/01/09 17:27:41=3D{user=3Dasok,site=3DDallas} Am I understanding this correctly? This seems to have the HUGE disadvantag= e that I am no longer going to be able to create secondary indexes on user = and site. Is that right? This seems like an impossible solution for my requirements. Steve From: Tyler Hobbs [mailto:tyler@datastax.com] Sent: Wednesday, January 09, 2013 2:21 PM To: user@cassandra.apache.org Subject: Re: Date Index? If you're going to be looking data up by date ranges frequently, I strongly= suggest you go with a typical time-series pattern (what Aaron described as= hand-rolled indexes): http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/ http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra If you're just running these date-based queries occasionally and the result= set won't be huge, then using secondary indexes as you described is a conv= enient but not terribly efficient way to do that. On Wed, Jan 9, 2013 at 10:04 AM, Michael Kjellman > wrote: ElasticSearch is a nice option for ordered lists. In 2.0 triggers would fit= updates to elastic search much easier as right now it's in your applicatio= n logic to detect changes and update. On Jan 9, 2013, at 7:55 AM, "Stephen.M.Thompson@wellsfargo.com" > wrote: Thanks Aaron, that helps. So is there anything approaching a "consensus" o= f how to do something like this? You mention a custom index ... is there a good document on creating a custo= m index? Google doesn't show me much. Steve From: aaron morton [mailto:aaron@thelastpickle.com] Sent: Tuesday, January 08, 2013 9:35 PM To: user@cassandra.apache.org Subject: Re: Date Index? There has to be one equality clause in there, and thats the thing to cassan= dra uses to select of disk. The others are in memory filters. So if you have one on the year+month you can have a simple select clause an= d it limits the amount of data that has to be read. If you have like many 10's to 100's millions of things in the same month yo= u may want to do some performance testing. There can still be times when yo= u want to support common read paths by using custom / hand rolled indexes. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 9/01/2013, at 6:05 AM, Stephen.M.Thompson@wellsfargo.com wrote: Hi folks - Question about secondary indexes. How are people doing date indexes? I = have a date column in my tables in RDBMS that we use frequently, such as lo= ok at all records recorded in the last month. What is the best practice fo= r being able to do such a query? It seems like there could be an advantage= to adding a couple of columns like this: {timestamp=3D2013/01/08 12:32:01 -0500} {month=3D201301} {day=3D08} And then I could do secondary index on the month and day columns? Would th= at be the best way to do something like this? Is there any accepted "best = practice" on this yet? Thanks! Steve ---------------------------------- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/= UAdL4f =20 -- Tyler Hobbs DataStax --_000_333B362E7B77B344A2D0FD92840282611F7F323249MSGCMSIL1003e_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

OK … I think I un= derstand these.  So the idea is that you would use the time as the col= umn key?

 

So when I might have something like this:

=

 

<key1> = | time=3D2013/01/03 08:19:01 | user=3Djohn | site=3DChicago

<key2> | time=3D2013/01/05 01:55:34 | user= =3Djohn | site=3DChicago

<key3&g= t; | time=3D2013/01/09 16:21:42 | user=3Djohn | site=3DNew York<= /span>

<key4> | time=3D2013/01/09 17:27:41 | us= er=3Dsusan | site=3DBoston

<key5= > | time=3D2013/01/09 17:27:41 | user=3Dasok | site=3DDallas<= /span>

 

Instead i= t would be better to do something like this:

 

<key1> | 2013/01= /03 08:19:01=3D {user=3Djohn, site=3DChicago} | 2013/01/05 01:55:34=3D{user= =3Djohn, site=3DChicago } | 2013/01/09 16:21:42=3D{user=3Djohn, site=3DNew = York}

<key2> | time=3D2013/01= /09 17:27:41 =3D {user=3Dsusan, site=3DBoston}

<key3> | time=3D2013/01/09 17:27:41=3D{user=3Dasok,site= =3DDallas}

 

<= p class=3DMsoNormal>Am I understanding this correctly?  This seems to have th= e HUGE disadvantage that I am no longer going to be able to create secondar= y indexes on user and site.  Is that right?  <= /o:p>

 

Thi= s seems like an impossible solution for my requirements.<= /p>

 

Steve=

 

<= b>From:<= /span> Tyler Hobbs [mailto:tyler@datastax.com]
Sent: Wednesday, Janu= ary 09, 2013 2:21 PM
To: user@cassandra.apache.org
Subject:= Re: Date Index?

 <= /o:p>

If you're going to be looking data up by= date ranges frequently, I strongly suggest you go with a typical time-seri= es pattern (what Aaron described as hand-rolled indexes):

ht= tp://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra=

If you're just running these date-based queries occasionally an= d the result set won't be huge, then using secondary indexes as you describ= ed is a convenient but not terribly efficient way to do that.

 =

On Wed, Jan 9, 2013 at 10:04 AM, Michae= l Kjellman <mkjellman@barracuda.com> wrote:

ElasticSearch is a nice option for ordered lists. In 2.0 triggers= would fit updates to elastic search much easier as right now it's in your = application logic to detect changes and update. 

<= div>


On Ja= n 9, 2013, at 7:55 AM, "Stephen.M.Thompson@wellsfargo.com" <Stephe= n.M.Thompson@wellsfargo.com> wrote:

Thanks Aaron, that= helps.  So is there anything approaching a “consensus” of= how to do something like this? 

 

You mention a custom index … is there a good document on= creating a custom index?  Google doesn’t show me much.

 

Steve

=  <= /span>

From: aaron morton [mailto:aaron@thelastpickle.co= m]
Sent: Tuesday, January 08, 2013 9:35 PM
To: user@cassandra.= apache.org
Subject: Re: Date Index?

 

There has to be one equality cla= use in there, and thats the thing to cassandra uses to select of disk. The = others are in memory filters. 

 

So if you have one on the year+month you can have= a simple select clause and it limits the amount of data that has to be rea= d. 

 

If you have like many 10's to 100's millions of things in the same mon= th you may want to do some performance testing. There can still be times wh= en you want to support common read paths by using custom / hand rolled inde= xes.

 

Cheers

 

=

-----------------

<= span style=3D'font-size:13.5pt;font-family:"Helvetica","sans-serif"'>Aaron = Morton

Freelance Cassandra Developer

New Zealand

<= p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:= auto'> 

@aaronmorton

 

On 9/01/201= 3, at 6:05 AM, Stephen.M.Thompson@wellsfargo.com wrote:

 

Hi folks –

 

Question about secondary i= ndexes.  How are people doing date indexes?    I have a= date column in my tables in RDBMS that we use frequently, such as look at = all records recorded in the last month.  What is the best practice for= being able to do such a query?  It seems like there could be an advan= tage to adding a couple of columns like this:

 

     &nb= sp;          {timestamp=3D2013= /01/08 12:32:01 -0500}

   =              {m= onth=3D201301}

    &n= bsp;           {day=3D08}=

 

And= then I could do secondary index on the month and day columns?  Would = that be the best way to do something like this?  Is there any accepted= “best practice” on this yet?

<= p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:= auto'>&= nbsp;

Thanks!

Steve

 

 

-------------------------------= ---
Join Barracuda Networks in the fight against hunger.
To learn ho= w you can help in your community, please visit: http://on.fb.me/UAdL4f

<= p class=3DMsoNormal>  ­­  

=




--
Tyler Hobbs
DataStax

= --_000_333B362E7B77B344A2D0FD92840282611F7F323249MSGCMSIL1003e_--