cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Need a little help with data model design
Date Mon, 05 Jul 2010 16:59:55 GMT
i would expect row per log entry will be substantially faster to query.

2010/7/5 Bartosz Kołodziej <bartosz.kolodziej@gmail.com>:
> I have big and dynamic number of loggers.
> According to this https://issues.apache.org/jira/browse/CASSANDRA-16 2GB
> size limit is no longer an issue in 0.7 (btw mnesia has similar issue ;-) )
> I think I can go with svn release at the moment.
> Solving this by composite key (logger+timestamp) would require
> OrderPreservingPartitioner to make efficient range queries, while in first
> approach in can go with RandomPartitioner (data would be partitioned by
> logger - simple and effective).
> Btw which model provides faster queries ?
> (i need only to get slice (timestamp1 to timestmap2) of data for logger X )
> On Mon, Jul 5, 2010 at 6:23 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> You don't want to have all the data from a single logger in a single
>> row b/c of the 2GB size limit.
>>
>> If you have a small, static number of loggers you could create one CF
>> per logger and use timestamp as the row key.  Otherwise use a
>> composite key (logger+timestamp) as the key in a single CF.
>>
>> 2010/7/2 Bartosz Kołodziej <bartosz.kolodziej@gmail.com>:
>> > I'm new to cassandra, and I want use it to store:
>> > loggers = { // (super)ColumnFamily ?
>> >     logger1 : { // row inside super CF ?
>> >         timestamp1 : {
>> >             value : 10
>> >         },
>> >         timestamp2 : {
>> >             value : 12
>> >         }
>> >         (many many many more)
>> >     }
>> >     logger2 : { //logger of diffrent type (in this example it logs 3
>> > values
>> > instead of 1)
>> >         timestamp1 : {
>> >             v : 300,
>> >             c : 123,
>> >             s : 12.13
>> >         },
>> >         timestamp2 : {
>> >             v : 300
>> >             c : 123
>> >             s : 12.13
>> >         }
>> >         (many many many more)
>> >     }
>> >     (many many many more)
>> > }
>> > the only way i will be accesing this data is:
>> > - example: fetch slice of data from logger2 ( start = 1278009131
>> > (timestmap)
>> > , end = 1278109131 )
>> >      expecting sorted array of data.
>> > - example: fetch slice of data from (logger2 and logger10 and logger20
>> > and
>> > logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 )
>> >      expecting map of sorted arrays of data. [it is basically N queries
>> > of
>> > first type]
>> > is this right definition of above: <ColumnFamily
>> > CompareWith="TimeUUIDType"
>> > ColumnType="Super"
>> >     CompareSubcolumnsWith="BytesType" Name="loggers"/> ?
>> > what's the best way to model this data in cassadra (keeping in mind
>> > partitioning and other important stuff) ?
>> >
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message