cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Konotop <alexander.kono...@gmail.com>
Subject Re: Modeling big data to allow filtering with a lot of distinct combinations of dimesions, in real time and with no latency
Date Mon, 07 Nov 2011 10:53:23 GMT
В Mon, 7 Nov 2011 11:18:12 +0100
Alain RODRIGUEZ <arodrime@gmail.com> пишет:

> Hi again.
> 
> Did you receive my mail ? It's the first time I use this mailing list.
> 
> If you received it, did anybody face this problem ?
> 
> It looks like this subject is going to be discussed at Cassandra NYC
> meeting.
> 
> http://www.datastax.com/2011/11/joe-stein-of-medialets-to-speak-at-cassandra-nyc
> 
> Any idea of what they are going to say about this subject or have I
> to wait ? Will the video record of this conference be public ?
> 
> thanks,
> 
> Alain
> 
> 2011/11/4 Alain RODRIGUEZ <arodrime@gmail.com>
> 
> > Hi all,
> >
> > I started this thread in the phpCassa google group, but I thinks
> > its place is here.
> >
> > There is my first post :
> >
> > "I was wondering about a specific point of Cassandra Modeling.
> >
> > If I need to know the number of connexion to my website using each
> > browser, every hour, I can do:
> >
> > Row key: $browser, column key: date('YmdH', $timestamp), value:
> > counter.
> >
> > I can increment this counter for any visit, this should work. The
> > point is that I want to be able to render the results of a lot of
> > statistics used as filters.
> >
> > I mean, I will have information such as browser, browser version,
> > screen resolution, OS, OS version, localization... And I want to
> > allow users to get data (number of views) filtering it as much as
> > they want.
> >
> > For example, if I want to know how many people visited my website
> > with safari, windos, and from New York, every hour, I can store:
> >
> > Row key : $browser:$os:$localization, column key : date('YmdH',
> > $timestamp), value : counter.
> >
> > This can't be the best solution because according to the
> > combinational mathematics I will have to store n! counters to be
> > able to store data with all filters. If I got 10 filters I will
> > increment 3 628 800 counters.
> >
> > That's not the good solution, for sure. How am I supposed to model
> > this to be able to read data with any filter I want ?
> >
> > Thanks,
> >
> > Alain"
> >
> >
> >
> > And there is the first answer given (thanks to Tyler Hobbs) :
> >
> > "Technically, the number of potential different counters would be
> > the cardinality of each field multiplied together.  (Since one of
> > the fields holds a time, this number would continue to grow.)
> > However, in practice you'll have far fewer than this number of
> > counters, because not every possible combination of these will
> > happen.
> >
> > >That's not the good solution, for sure. How am I supposed to model
> >
> > > this to be able to read data with any filter I want ?
> >
> > It's a reasonable solution if you want to be able to drill down and
> > filter by any attribute.  If you want to be able to filter based on
> > all of these attributes, you have to store that information about
> > every request in one way or another."
> >
> >
> >
> > I know it's a non-trivial problem, but I'm sure that some people
> > already faced this problem before I do.
> >
> > I'll allow user to filter however they want, chosing dimensions with
> > checkboxes. They will be able to combine dimensions and ask for any
> > combination.
> >
> > So, with this solution, I will have to store every event n times,
> > with n = number of possible combinations.
> >
> > I saw this yesterday : http://t.co/EXL6yAO8 (thanks to Dave
> > Gardner). This company seems to something equivalent of the idea
> > exposed in my first post....
> >
> > Any experience to share with this kind of problem ?
> >
> > thank you,
> >
> > Alain
> >
> >

Looks like Your mail has been recieved but for now nowbody has an
answer. As for me - I'm a cassandra newbie and definitely can't help :-(

Best regards
Alexander

Mime
View raw message