cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: How to store unique visitors in cassandra
Date Tue, 31 Mar 2015 15:53:22 GMT
Hi Laing, I think you answered the wrong mail =).

This one is around UV on custom range model.

But I am happy that you agree on my last message about the Datacenter
switch.

C*heers

2015-03-31 16:29 GMT+02:00 Laing, Michael <michael.laing@nytimes.com>:

> We use Alain's solution as well to make major operational revisions.
>
> We have a "red team" and a "blue team in each AWS region, so we just add
> and drop datacenters to get where we want to be.
>
> Pretty simple.
>
> ml
>
> On Tue, Mar 31, 2015 at 8:16 AM, Alain RODRIGUEZ <arodrime@gmail.com>
> wrote:
>
>> People keep asking me if we finally found a solution (even if this is 3+
>> years old) so I will just update this thread with our findings.
>>
>> We finally achieved doing this thanks to our bigdata and reporting stacks
>> by storing blobs corresponding to HLL (HyperLogLog) structures. HLL is an
>> algorithm used by Google, twitter and many more to solve count-distinct
>> problems. Structures built through this algorithm can be "summed" and give
>> a good approximation of the UV number.
>>
>> Precision you will reach depends on the size of structure you chose
>> (predictable precision). You can reach fairly acceptable approximation with
>> small data structures.
>>
>> So we basically store a HLL per hour and just "sum" HLL for all the hours
>> between 2 ranges (you can do it at day level or any other level depending
>> on your needs).
>>
>> Hope this will help some of you, we finally had this (good) idea after
>> more than 3 years. Actually we use HLL for a long time but the idea of
>> storing HLL structures instead of counts allow us to request on custom
>> ranges (at the price of more intelligence on the reporting stack that must
>> read and smartly sum HLLs stored as blobs). We are happy with it since.
>>
>> C*heers,
>>
>> Alain
>>
>> 2012-01-19 22:21 GMT+01:00 Milind Parikh <milindparikh@gmail.com>:
>>
>>> You might want to look at the code in countandra.org; regardless of
>>> whether you use it. It use a model of dynamic composite keys (although
>>> static composite keys would have worked as well). For the actual query,only
>>> one row is hit. This of course only works bc the data model is attuned for
>>> the query.
>>>
>>> Regards
>>> Milind
>>>
>>> /***********************
>>> sent from my android...please pardon occasional typos as I respond @ the
>>> speed of thought
>>> ************************/
>>>
>>> On Jan 19, 2012 1:31 AM, "Alain RODRIGUEZ" <arodrime@gmail.com> wrote:
>>>
>>> Hi thanks for your answer but I don't want to add more layer on top of
>>> Cassandra. I also have done all of my application without Countandra and I
>>> would like to continue this way.
>>>
>>> Furthermore there is a Cassandra modeling problem that I would like to
>>> solve, and not just hide.
>>>
>>> Alain
>>>
>>>
>>>
>>> 2012/1/18 Lucas de Souza Santos <lucasdss@gmail.com>
>>> >
>>> > Why not http://www.countandra.org/
>>> >
>>> >
>>> > ...
>>>
>>>
>>
>

Mime
View raw message