I started this thread in the phpCassa google group, but I thinks its place is here.
There is my first post :
"I was wondering about a specific point of Cassandra Modeling.
If I need to know the number of connexion to my website using each browser, every hour, I can do:
Row key: $browser, column key: date('YmdH', $timestamp), value: counter.
I can increment this counter for any visit, this should work. The point is that I want to be able to render the results of a lot of statistics used as filters.
I mean, I will have information such as browser, browser version, screen resolution, OS, OS version, localization... And I want to allow users to get data (number of views) filtering it as much as they want.
For example, if I want to know how many people visited my website with safari, windos, and from New York, every hour, I can store:
Row key : $browser:$os:$localization, column key : date('YmdH', $timestamp), value : counter.
This can't be the best solution because according to the combinational mathematics I will have to store n! counters to be able to store data with all filters. If I got 10 filters I will increment 3 628 800 counters.
That's not the good solution, for sure. How am I supposed to model this to be able to read data with any filter I want ?
And there is the first answer given (thanks to Tyler Hobbs) :
"Technically, the number of potential different counters would be the cardinality of each field multiplied together. (Since one of the fields holds a time, this number would continue to grow.) However, in practice you'll have far fewer than this number of counters, because not every possible combination of these will happen.
>That's not the good solution, for sure. How am I supposed to model
> this to be able to read data with any filter I want ?
It's a reasonable solution if you want to be able to drill down and filter by any attribute. If you want to be able to filter based on all of these attributes, you have to store that information about every request in one way or another."
I know it's a non-trivial problem, but I'm sure that some people already faced this problem before I do.
I'll allow user to filter however they want, chosing dimensions with checkboxes. They will be able to combine dimensions and ask for any combination.
So, with this solution, I will have to store every event n times, with n = number of possible combinations.
I saw this yesterday : http://t.co/EXL6yAO8
(thanks to Dave Gardner). This company seems to something equivalent of the idea exposed in my first post....
Any experience to share with this kind of problem ?