cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Stephen.M.Thomp...@wellsfargo.com>
Subject RE: Date Index?
Date Wed, 09 Jan 2013 21:37:19 GMT
OK ... I think I understand these.  So the idea is that you would use the time as the column
key?

So when I might have something like this:

<key1> | time=2013/01/03 08:19:01 | user=john | site=Chicago
<key2> | time=2013/01/05 01:55:34 | user=john | site=Chicago
<key3> | time=2013/01/09 16:21:42 | user=john | site=New York
<key4> | time=2013/01/09 17:27:41 | user=susan | site=Boston
<key5> | time=2013/01/09 17:27:41 | user=asok | site=Dallas

Instead it would be better to do something like this:

<key1> | 2013/01/03 08:19:01= {user=john, site=Chicago} | 2013/01/05 01:55:34={user=john,
site=Chicago } | 2013/01/09 16:21:42={user=john, site=New York}
<key2> | time=2013/01/09 17:27:41 = {user=susan, site=Boston}
<key3> | time=2013/01/09 17:27:41={user=asok,site=Dallas}

Am I understanding this correctly?  This seems to have the HUGE disadvantage that I am no
longer going to be able to create secondary indexes on user and site.  Is that right?

This seems like an impossible solution for my requirements.

Steve

From: Tyler Hobbs [mailto:tyler@datastax.com]
Sent: Wednesday, January 09, 2013 2:21 PM
To: user@cassandra.apache.org
Subject: Re: Date Index?

If you're going to be looking data up by date ranges frequently, I strongly suggest you go
with a typical time-series pattern (what Aaron described as hand-rolled indexes):

http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

If you're just running these date-based queries occasionally and the result set won't be huge,
then using secondary indexes as you described is a convenient but not terribly efficient way
to do that.

On Wed, Jan 9, 2013 at 10:04 AM, Michael Kjellman <mkjellman@barracuda.com<mailto:mkjellman@barracuda.com>>
wrote:
ElasticSearch is a nice option for ordered lists. In 2.0 triggers would fit updates to elastic
search much easier as right now it's in your application logic to detect changes and update.

On Jan 9, 2013, at 7:55 AM, "Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com>"
<Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com>>
wrote:
Thanks Aaron, that helps.  So is there anything approaching a "consensus" of how to do something
like this?

You mention a custom index ... is there a good document on creating a custom index?  Google
doesn't show me much.

Steve

From: aaron morton [mailto:aaron@thelastpickle.com]
Sent: Tuesday, January 08, 2013 9:35 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Date Index?

There has to be one equality clause in there, and thats the thing to cassandra uses to select
of disk. The others are in memory filters.

So if you have one on the year+month you can have a simple select clause and it limits the
amount of data that has to be read.

If you have like many 10's to 100's millions of things in the same month you may want to do
some performance testing. There can still be times when you want to support common read paths
by using custom / hand rolled indexes.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/01/2013, at 6:05 AM, Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com>
wrote:

Hi folks -

Question about secondary indexes.  How are people doing date indexes?    I have a date column
in my tables in RDBMS that we use frequently, such as look at all records recorded in the
last month.  What is the best practice for being able to do such a query?  It seems like there
could be an advantage to adding a couple of columns like this:

                {timestamp=2013/01/08 12:32:01 -0500}
                {month=201301}
                {day=08}

And then I could do secondary index on the month and day columns?  Would that be the best
way to do something like this?  Is there any accepted "best practice" on this yet?

Thanks!
Steve


----------------------------------
Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f
  



--
Tyler Hobbs
DataStax<http://datastax.com/>

Mime
View raw message