incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <...@jonhaddad.com>
Subject Re: Organize model for range scans in Cassandra
Date Fri, 18 Oct 2013 06:17:34 GMT
I'd avoid using super columns.  I don't believe they're recommended
anymore, and with CQL3 they aren't even supported (if you're interested in
going that route).  I think it's unlikely that you'll want a column family
per company either.

How many "ticker" entries do you plan on writing per company?  You've got a
lot of elipses in there as well, which makes me wonder what other data
you're looking to store.

To take a guess, I'd wager you'd be looking for a trades table, and another
table that tracks the closing price per day.  In the trades table,
something along the lines of this CQL3 definition might be helpful:

create table trades (
company text,
ts timeuuid,
price decimal,
primary key(company, ts) );

This would give you a single row in the traditional Cassandra sense, and it
would be ordered by the timestamp you supply.  You can use a timeuuid to
avoid the duplicate timestamp problem.

This is about as far as I can go without knowing more about what you're
actually trying to do...  I think it's going to be difficult for anyone to
give you helpful advice unless you can elaborate a bit on what your
requirements are.

Jon



On Thu, Oct 17, 2013 at 10:51 PM, Rajith Siriwardana <
rajithsiriwardana@gmail.com> wrote:

> Hi all,
>
> I have a problem like this,
>
> I have stock transaction data, as follows.
> Ticker data:
>     Company name:
>              timestamp:
>                  closing price (N): (V)
>                  trades (N) : (V)
>                  ......
>              .....
>     ......
>
> In my model : I want to execute range queries on timestamps, (sorted
> order)
>
> approaches currently have in mind,
>
>      1. I can have  ticker data : columnfamily, company name : rowkey,
> timestamp: super column,  and other attributes as columns. In this way
> there will be around *100 rowkeys*, around *1M timestamps*, around *10
> columns under one super column.*
>            Problems
>
>    - Cassandra best practices are to use the RandomPartitioner - this
>       gives you 'free' load balancing, as long as your tokens are evenly
>       distributed. so the load balancing would happen on 100 row keys. is this
>       acceptable approach?
>       - and there is a possibility to have duplicates in timestamps. that
>       will be a problem.
>
>
>     2. I can have  ticker data : keyspace, company name : column family,
> timestamp: row key,  and other attributes as columns. In this way there
> will be around *100 column families*, around *1M row keys*, around *10
> columns per one row.*
>
>           Problems
>
>    - In this way, range queries are not in sorted order.
>       - and I guess there is also duplicate row key problem
>
> Any suggestions how I can overcome this?
>
> Cheers,
> Rajith
>
>
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Mime
View raw message