cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From samal <samalgo...@gmail.com>
Subject Re: RE Ordering counters in Cassandra
Date Tue, 22 May 2012 07:24:54 GMT
In some cases Cassandra is really good and in some cases it is not.

The way I see your approach is your are recording all of your events in
single "key" is it? Not recommended. It can go really big also if your have
cluster of servers, "It will hit only one server all the time make it
overwhelm, and rest will sit ideal, take a nap.

I will do like, I will figure out what are similar events that is
occurring, and then bucket by those event.

eg: if event is occurred from IOS or Andriod. I will bucket by IOS and
android KEY, so here counter will give me all events occurred from  IOS or
andriod.

KEY, concat can also be use to filter out more deep: IOS#safari,
andriod#chrome.

Less number of columns will help to reverse index more efficiently.

/Samal

On Mon, May 21, 2012 at 11:53 PM, Tamar Fraenkel <tamar@tok-media.com>wrote:

> Indeed I took the not delete approach. If time bucket rows are not that
> big, this is a good temporary solution.
> I just finished implementation and testing now on a small staging
> environment. So far so good.
> Tamar
>
> Sent from my iPod
>
> On May 21, 2012, at 9:11 PM, Filippo Diotalevi <filippo@ntoklo.com> wrote:
>
>  Hi Tamar,
> the solution you propose is indeed a "temporary solution", but it might be
> the best one.
>
> Which approach did you follow?
> I'm a bit concerned about the deletion approach, since in case of
> concurrent writes on the same counter you might "lose" the pointer to the
> column to delete.
>
> --
> Filippo Diotalevi
>
>
> On Monday, 21 May 2012 at 18:51, Tamar Fraenkel wrote:
>
> I also had a similar problem. I have a temporary solution, which is not
> best, but may be of help.
> I have the coutner cf to count events, but apart from that I hold leaders
> CF:
>
> leaders = {
>   // key is time bucket
>   // values are composites(rank, event) ordered by
>   // descending order of the rank
>   // set relevant TTL on columns
>   time_bucket1 : {
>     composite(1000,event1) : ""
>     composite(999, event2) : ""
>   },
>   ...
> }
>
> Whenever I increment counter for a specific event, I add a column in the
> time bucket row of the leaders CF, with the new value of the counter and
> the event name.
> There are two ways to go here, either delete the old column(s) for that
> event (with lower counters) from leaders CF. Or let them be.
> If you choose to delete, there is the complication of not having getAndSetfor counters,
so you may end up not deleting all the old columns.
> If you choose not to  delete old column, and live with duplicate columns
> for events (each with different count), it will make your query to
> retrieve leaders run longer.
> Anyway, when you need to retrieve the leaders, you can do slice query onleaders CF and
ignore
> duplicates events using client (I use Java). This will happen less if you
> do delete old columns.
>
> Another option is not to use Cassandra for that purpose, http://redis.io/ is
> a nice tool
>
> Will be happy to hear you comments.
> Thanks,
>
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> <tokLogo.png>
>
>
> tamar@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
>
> On Mon, May 21, 2012 at 8:05 PM, Filippo Diotalevi <filippo@ntoklo.com>wrote:
>
> Hi Romain,
> thanks for your suggestion.
>
> When you say " build every day a ranking in a dedicated CF by iterating
> over events:" do you mean
> - load all the columns for the specified row key
> - iterate over each column, and write a new column in the inversed index
> ?
>
> That's my current approach, but since I have many of these wide rows (1
> per day), the process is extremely slow as it involves moving an entire row
> from Cassandra to client, inverting every column, and sending the data back
> to create the inversed index.
>
> --
> Filippo Diotalevi
>
>
> On Monday, 21 May 2012 at 17:19, Romain HARDOUIN wrote:
>
>
> If I understand you've got a data model which looks like this:
>
> CF Events:
>     "row1": { "event1": 1050, "event2": 1200, "event3": 830, ... }
>
> You can't query on column values but you can build every day a ranking in
> a dedicated CF by iterating over events:
>
> create column family Ranking
>     with comparator = 'LongType(reversed=true)'
>     ...
>
> CF Ranking:
>     "rank": { 1200: "event2", 1050: "event1", 830: "event3", ... }
>
> Then you can make a "top ten" or whatever you want because counter values
> will be sorted.
>
>
> Filippo Diotalevi <filippo@ntoklo.com> a écrit sur 21/05/2012 16:59:43 :
>
> > Hi,
> > I'm trying to understand what's the best design for a simple
> > "ranking" use cases.
> > I have, in a row, a good number (10k - a few 100K) of counters; each
> > one is counting the occurrence of an event. At the end of day, I
> > want to create a ranking of the most occurred event.
> >
> > What's the best approach to perform this task?
> > The brute force approach of retrieving the row and ordering it
> > doesn't work well (the call usually goes timeout, especially is
> > Cassandra is also under load); I also don't know in advance the full
> > set of event names (column names), so it's difficult to slice the get
> call.
> >
> > Is there any trick to solve this problem? Maybe a way to retrieve
> > the row ordering for counter values?
> >
> > Thanks,
> > --
> > Filippo Diotalevi
>
>
>
>
>

Mime
View raw message