cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajith Siriwardana <>
Subject Organize model for range scans in Cassandra
Date Fri, 18 Oct 2013 05:51:25 GMT
Hi all,

I have a problem like this,

I have stock transaction data, as follows.
Ticker data:
    Company name:
                 closing price (N): (V)
                 trades (N) : (V)

In my model : I want to execute range queries on timestamps, (sorted order)

approaches currently have in mind,

     1. I can have  ticker data : columnfamily, company name : rowkey,
timestamp: super column,  and other attributes as columns. In this way
there will be around *100 rowkeys*, around *1M timestamps*, around *10
columns under one super column.*

   - Cassandra best practices are to use the RandomPartitioner - this gives
      you 'free' load balancing, as long as your tokens are evenly distributed.
      so the load balancing would happen on 100 row keys. is this acceptable
      - and there is a possibility to have duplicates in timestamps. that
      will be a problem.

    2. I can have  ticker data : keyspace, company name : column family,
timestamp: row key,  and other attributes as columns. In this way there
will be around *100 column families*, around *1M row keys*, around *10
columns per one row.*


   - In this way, range queries are not in sorted order.
      - and I guess there is also duplicate row key problem

Any suggestions how I can overcome this?


View raw message