incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filippo Diotalevi <fili...@ntoklo.com>
Subject Re: RE Ordering counters in Cassandra
Date Mon, 21 May 2012 17:05:28 GMT
Hi Romain,  
thanks for your suggestion.

When you say " build every day a ranking in a dedicated CF by iterating over events:" do you
mean
- load all the columns for the specified row key
- iterate over each column, and write a new column in the inversed index
?

That's my current approach, but since I have many of these wide rows (1 per day), the process
is extremely slow as it involves moving an entire row from Cassandra to client, inverting
every column, and sending the data back to create the inversed index.  

--  
Filippo Diotalevi



On Monday, 21 May 2012 at 17:19, Romain HARDOUIN wrote:

>  
> If I understand you've got a data model which looks like this:  
>  
> CF Events:  
>     "row1": { "event1": 1050, "event2": 1200, "event3": 830, ... }  
>  
> You can't query on column values but you can build every day a ranking in a dedicated
CF by iterating over events:  
>  
> create column family Ranking  
>     with comparator = 'LongType(reversed=true)'    
>     ...  
>  
> CF Ranking:  
>     "rank": { 1200: "event2", 1050: "event1", 830: "event3", ... }  
>      
> Then you can make a "top ten" or whatever you want because counter values will be sorted.
 
>  
>  
> Filippo Diotalevi <filippo@ntoklo.com (mailto:filippo@ntoklo.com)> a écrit sur
21/05/2012 16:59:43 :
>  
> > Hi,  
> > I'm trying to understand what's the best design for a simple  
> > "ranking" use cases.  
> > I have, in a row, a good number (10k - a few 100K) of counters; each
> > one is counting the occurrence of an event. At the end of day, I  
> > want to create a ranking of the most occurred event.  
> >  
> > What's the best approach to perform this task?  
> > The brute force approach of retrieving the row and ordering it  
> > doesn't work well (the call usually goes timeout, especially is  
> > Cassandra is also under load); I also don't know in advance the full
> > set of event names (column names), so it's difficult to slice the get call.  
> >  
> > Is there any trick to solve this problem? Maybe a way to retrieve  
> > the row ordering for counter values?  
> >  
> > Thanks,  
> > --  
> > Filippo Diotalevi  


Mime
View raw message