incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Modeling big data to allow filtering with a lot of distinct combinations of dimesions, in real time and with no latency
Date Mon, 07 Nov 2011 10:18:12 GMT
Hi again.

Did you receive my mail ? It's the first time I use this mailing list.

If you received it, did anybody face this problem ?

It looks like this subject is going to be discussed at Cassandra NYC
meeting.

http://www.datastax.com/2011/11/joe-stein-of-medialets-to-speak-at-cassandra-nyc

Any idea of what they are going to say about this subject or have I to wait
? Will the video record of this conference be public ?

thanks,

Alain

2011/11/4 Alain RODRIGUEZ <arodrime@gmail.com>

> Hi all,
>
> I started this thread in the phpCassa google group, but I thinks its place
> is here.
>
> There is my first post :
>
> "I was wondering about a specific point of Cassandra Modeling.
>
> If I need to know the number of connexion to my website using each
> browser, every hour, I can do:
>
> Row key: $browser, column key: date('YmdH', $timestamp), value: counter.
>
> I can increment this counter for any visit, this should work. The point is
> that I want to be able to render the results of a lot of statistics used as
> filters.
>
> I mean, I will have information such as browser, browser version, screen
> resolution, OS, OS version, localization... And I want to allow users to
> get data (number of views) filtering it as much as they want.
>
> For example, if I want to know how many people visited my website with
> safari, windos, and from New York, every hour, I can store:
>
> Row key : $browser:$os:$localization, column key : date('YmdH',
> $timestamp), value : counter.
>
> This can't be the best solution because according to the combinational
> mathematics I will have to store n! counters to be able to store data with
> all filters. If I got 10 filters I will increment 3 628 800 counters.
>
> That's not the good solution, for sure. How am I supposed to model this to
> be able to read data with any filter I want ?
>
> Thanks,
>
> Alain"
>
>
>
> And there is the first answer given (thanks to Tyler Hobbs) :
>
> "Technically, the number of potential different counters would be the
> cardinality of each field multiplied together.  (Since one of the fields
> holds a time, this number would continue to grow.) However, in practice
> you'll have far fewer than this number of counters, because not every
> possible combination of these will happen.
>
> >That's not the good solution, for sure. How am I supposed to model
>
> > this to be able to read data with any filter I want ?
>
> It's a reasonable solution if you want to be able to drill down and filter
> by any attribute.  If you want to be able to filter based on all of these
> attributes, you have to store that information about every request in one
> way or another."
>
>
>
> I know it's a non-trivial problem, but I'm sure that some people already
> faced this problem before I do.
>
> I'll allow user to filter however they want, chosing dimensions with
> checkboxes. They will be able to combine dimensions and ask for any
> combination.
>
> So, with this solution, I will have to store every event n times, with n =
> number of possible combinations.
>
> I saw this yesterday : http://t.co/EXL6yAO8 (thanks to Dave Gardner).
> This company seems to something equivalent of the idea exposed in my first
> post....
>
> Any experience to share with this kind of problem ?
>
> thank you,
>
> Alain
>
>

Mime
View raw message