cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nate McCall <n...@datastax.com>
Subject Re: custom reconciling columns?
Date Tue, 28 Jun 2011 23:18:35 GMT
I agree with Aaron's suggestion on data model and query here. Since
there is a time component, you can split the row on a fixed duration
for a given user, so the row key would become userId_[timestamp
rounded to day].

This provides you an easy way to roll up the information for the date
ranges you need since the key suffix can be created without a read.
This also benefits from spreading the read load over the cluster
instead of just the replicas since you have 30 rows in this case
instead of one.

On Tue, Jun 28, 2011 at 5:55 PM, aaron morton <aaron@thelastpickle.com> wrote:
> Can you provide some more info:
> - how big are the rows, e.g. number of columns and column size  ?
> - how much data are you asking for ?
> - what sort of read query are you using ?
> - what sort of numbers are you seeing ?
> - are you deleting columns or using TTL ?
> I would consider issues with the data churn, data model and query before
> looking at serialisation.
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 29 Jun 2011, at 10:37, Yang wrote:
>
> I can see that as my user history grows, the reads time proportionally ( or
> faster than linear) grows.
> if my business requirements ask me to keep a month's history for each user,
> it could become too slow.----- I was suspecting that it's actually the
> serializing and deserializing that's taking time (I can definitely it's cpu
> bound)
>
>
> On Tue, Jun 28, 2011 at 3:04 PM, aaron morton <aaron@thelastpickle.com>
> wrote:
>>
>> There is no facility to do custom reconciliation for a column. An append
>> style operation would run into many of the same problems as the Counter
>> type, e.g. not every node may get an append and there is a chance for lost
>> appends unless you go to all the trouble Counter's do.
>>
>> I would go with using a row for the user and columns for each item. Then
>> you can have fast no look writes.
>>
>> What problems are you seeing with the reads ?
>>
>> Cheers
>>
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 29 Jun 2011, at 04:20, Yang wrote:
>>
>> > for example, if I have an application that needs to read off a user
>> > browsing history, and I model the user ID as the key,
>> > and the history data within the row. with current approach, I could
>> > model each visit as  a column,
>> > the possible issue is that *possibly* (I'm still doing a lot of
>> > profiling on this to verify) that a lot of time is spent on serialization
>> > into the message and out of the
>> > message, plus I do not need the full features provided by the column :
>> > for example I do not need a timestamp on each visit, etc,
>> > so it might be faster to put the entire history in a blob, and each
>> > visit only takes up a few bytes in the blob, and
>> > my code manipulates the blob.
>> >
>> > problem is, I still need to avoid the read-before-write, so I send only
>> > the latest visit, and let cassandra do the reconcile, which appends the
>> > visit to the blob, so this needs custom reconcile behavior.
>> >
>> > is there a way to incorporate such custom reconcile under current code
>> > framework? (I see custom sorting, but no custom reconcile)
>> >
>> > thanks
>> > yang
>>
>
>
>

Mime
View raw message