cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: User click count
Date Mon, 29 Dec 2014 13:40:30 GMT
Hi Ajay,

Here is a good explanation you might want to read.

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

Though we use counters for 3 years now, we used them from start C* 0.8 and
we are happy with them. Limits I can see in both ways are:

Counters:

- accuracy indeed (Tend to be small in our use case < 5% - when the
business allow 10%, so fair enough for us) + we recount them through a
batch processing tool (spark / hadoop - Kind of lambda architecture). So
our real-time stats are inaccurate and after a few minutes or hours we have
the real value.
- Read-Before-Write model, which is an anti-pattern. Makes you use more
machine due to the pressure involved, affordable for us too.

Raw data (counted)

- Space used (can become quite impressive very fast, depending on your
business) !
- Time to answer a request (we expose the data to customer, they don't want
to wait 10 sec for Cassandra to read 1 000 000 + columns)
- Performances in o(n) (linear) instead of o(1) (constant). Customer won't
always understand that for you it is harder to read 1 than 1 000 000, since
it should be reading 1 number in both case, and your interface will have
very unstable read time.

Pick the best solution (or combination) for your use case. Those
disadvantages lists are not exhaustive, just things that came to my mind
right now.

C*heers

Alain

2014-12-29 13:33 GMT+01:00 Ajay <ajay.garga@gmail.com>:

> Hi,
>
> So you mean to say counters are not accurate? (It is highly likely that
> multiple parallel threads trying to increment the counter as users click
> the links).
>
> Thanks
> Ajay
>
>
> On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen <janne.jalkanen@ecyrd.com>
> wrote:
>
>>
>> Hi!
>>
>> It's really a tradeoff between accurate and fast and your read access
>> patterns; if you need it to be fairly fast, use counters by all means, but
>> accept the fact that they will (especially in older versions of cassandra
>> or adverse network conditions) drift off from the true click count.  If you
>> need accurate, use a timeuuid and count the rows (this is fairly safe for
>> replays too).  However, if using timeuuids your storage will need lots of
>> space; and your reads will be slow if the click counts are huge (because
>> Cassandra will need to read every item).  Using counters makes it easy to
>> just grab a slice of the time series data and shove it to a client for
>> visualization.
>>
>> You could of course do a hybrid system; use timeuuids and then
>> periodically count and add the result to a regular column, and then remove
>> the columns.  Note that you might want to optimize this so that you don't
>> end up with a lot of tombstones, e.g. by bucketing the writes so that you
>> can delete everything with just a single partition delete.
>>
>> At Thinglink some of the more important counters that we use are backed
>> up by the actual data. So for speed purposes we use always counters for
>> reads, but there's a repair process that fixes the counter value if we
>> suspect it starts drifting off the real data too much.  (You might be able
>> to tell that we've been using counters for quite some time :-P)
>>
>> /Janne
>>
>> On 29 Dec 2014, at 13:00, Ajay <ajay.garga@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > Is it better to use Counter to User click count than maintaining
>> creating new row as user id : timestamp and count it.
>> >
>> > Basically we want to track the user clicks and use the same for
>> hourly/daily/monthly report.
>> >
>> > Thanks
>> > Ajay
>>
>>
>

Mime
View raw message