incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Schubert Zhang <zson...@gmail.com>
Subject Re: Cassandra data model for financial data
Date Thu, 29 Apr 2010 05:09:21 GMT
key : stock ID,  e.g. AAPL+year
column family: closting price and valume, tow CFs.
colum name: timestamp LongType

AAPL+2010-> CF:closingPrice -> {'04-13' : 242, '04-14': 245}
AAPL+2010-> CF:volume -> {'04-13' : 242, '04-14': 245}


On Thu, Apr 22, 2010 at 2:00 AM, Miguel Verde <miguelitovert@gmail.com>wrote:

> On Wed, Apr 21, 2010 at 12:17 PM, Steve Lihn <stevelihn@gmail.com> wrote:
>
>> [...]
>
>
>
>> Design 1: Each attribute is a super column. Therefore each date is a
>> column. So we have:
>>
>> AAPL -> closingPrice -> { '2010-04-13' : 242, '2010-04-14': 245 }
>> AAPL -> volume -> { '2010-04-13' : 10.9m, '2010-04-14': 14.4m }
>> etc.
>>
> I would suggest not using this design, as each query involving an attribute
> will pull all dates for that attribute into memory on the server.  i.e.
> getting the closingPrice for AAPL on '2010-04-13' would pull all closing
> prices for AAPL across all dates into memory.
>
>
>>
>> Design 2: Each date is a super column. Therefore each attribute is a
>> column. So we have:
>>
>> AAPL -> '2010-04-13' -> { closingPrice -> 242, volume -> 10.9m }
>> AAPL -> '2010-04-14' -> {closingPrice -> 245, volume -> 14.4m }
>> etc.
>>
>> The date column / superColumn will need Order Perserving Partitioner since
>> we are going to do a lot of range queries.
>
>
> Partitioners split up keys between nodes, the partitioner you use has no
> effect on your ability to query columns in a row.
>
>
>> Examples are:
>> Query 1: Give me the data between date1 and date2 for a set of tickers
>> (say, the 100 tickers in QQQ).
>>
> You could use http://wiki.apache.org/cassandra/API#multiget_slice for
> this.
>
>
>> Query 2: More often than not, the query is: Give me the data for the max
>> available dates (for each ticker) between date1 and date2 in a set of
>> tickers.
>> (Since not every day is traded, and we only want the most recent data,
>> given a range of dates.)
>>
> A http://wiki.apache.org/cassandra/API#SliceRange allows you to specify
> limits and ordering for columns you are slicing.
>
>
>
>
>
>

Mime
View raw message