incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajith Siriwardana <rajithsiriward...@gmail.com>
Subject Re: Organize model for range scans in Cassandra
Date Fri, 18 Oct 2013 06:56:56 GMT
Hi Jon,

Thanks for quick reply, I'm a newbie to Cassandra. Even though I made a
mistake in previous mail. you got it right. I'll check what you've said.

Cheers,
Rajith.


On Fri, Oct 18, 2013 at 11:47 AM, Jonathan Haddad <jon@jonhaddad.com> wrote:

> I'd avoid using super columns.  I don't believe they're recommended
> anymore, and with CQL3 they aren't even supported (if you're interested in
> going that route).  I think it's unlikely that you'll want a column family
> per company either.
>
> How many "ticker" entries do you plan on writing per company?  You've got
> a lot of elipses in there as well, which makes me wonder what other data
> you're looking to store.
>
> To take a guess, I'd wager you'd be looking for a trades table, and
> another table that tracks the closing price per day.  In the trades table,
> something along the lines of this CQL3 definition might be helpful:
>
> create table trades (
> company text,
> ts timeuuid,
> price decimal,
> primary key(company, ts) );
>
> This would give you a single row in the traditional Cassandra sense, and
> it would be ordered by the timestamp you supply.  You can use a timeuuid to
> avoid the duplicate timestamp problem.
>
> This is about as far as I can go without knowing more about what you're
> actually trying to do...  I think it's going to be difficult for anyone to
> give you helpful advice unless you can elaborate a bit on what your
> requirements are.
>
> Jon
>
>
>
> On Thu, Oct 17, 2013 at 10:51 PM, Rajith Siriwardana <
> rajithsiriwardana@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a problem like this,
>>
>> I have stock transaction data, as follows.
>> Ticker data:
>>     Company name:
>>              timestamp:
>>                  closing price (N): (V)
>>                  trades (N) : (V)
>>                  ......
>>              .....
>>     ......
>>
>> In my model : I want to execute range queries on timestamps, (sorted
>> order)
>>
>> approaches currently have in mind,
>>
>>      1. I can have  ticker data : columnfamily, company name : rowkey,
>> timestamp: super column,  and other attributes as columns. In this way
>> there will be around *100 rowkeys*, around *1M timestamps*, around *10
>> columns under one super column.*
>>            Problems
>>
>>    - Cassandra best practices are to use the RandomPartitioner - this
>>       gives you 'free' load balancing, as long as your tokens are evenly
>>       distributed. so the load balancing would happen on 100 row keys. is this
>>       acceptable approach?
>>       - and there is a possibility to have duplicates in timestamps.
>>       that will be a problem.
>>
>>
>>     2. I can have  ticker data : keyspace, company name : column family,
>> timestamp: row key,  and other attributes as columns. In this way there
>> will be around *100 column families*, around *1M row keys*, around *10
>> columns per one row.*
>>
>>           Problems
>>
>>    - In this way, range queries are not in sorted order.
>>       - and I guess there is also duplicate row key problem
>>
>> Any suggestions how I can overcome this?
>>
>> Cheers,
>> Rajith
>>
>>
>>
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Mime
View raw message