incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajith Siriwardana <rajithsiriward...@gmail.com>
Subject Organize model for range scans in Cassandra
Date Fri, 18 Oct 2013 05:51:25 GMT
Hi all,

I have a problem like this,

I have stock transaction data, as follows.
Ticker data:
    Company name:
             timestamp:
                 closing price (N): (V)
                 trades (N) : (V)
                 ......
             .....
    ......

In my model : I want to execute range queries on timestamps, (sorted order)


approaches currently have in mind,

     1. I can have  ticker data : columnfamily, company name : rowkey,
timestamp: super column,  and other attributes as columns. In this way
there will be around *100 rowkeys*, around *1M timestamps*, around *10
columns under one super column.*
           Problems

   - Cassandra best practices are to use the RandomPartitioner - this gives
      you 'free' load balancing, as long as your tokens are evenly distributed.
      so the load balancing would happen on 100 row keys. is this acceptable
      approach?
      - and there is a possibility to have duplicates in timestamps. that
      will be a problem.


    2. I can have  ticker data : keyspace, company name : column family,
timestamp: row key,  and other attributes as columns. In this way there
will be around *100 column families*, around *1M row keys*, around *10
columns per one row.*

          Problems

   - In this way, range queries are not in sorted order.
      - and I guess there is also duplicate row key problem

Any suggestions how I can overcome this?

Cheers,
Rajith

Mime
View raw message