incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <had...@holsman.net>
Subject Re: need some help with counters
Date Thu, 16 Jun 2011 18:29:12 GMT

On Jun 13, 2011, at 5:10 AM, aaron morton wrote:

>> I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs
type query).. 
> 
> AFAIK thats not a great application for counters. You would need range support in the
secondary indexes so you could get the first X rows ordered by a column value. 
> 
> To be honest, depending on scale, I'd consider a sorted set in redis for that. 

It does.
Thanks Aaron.

> 
> Hope that helps. 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 11 Jun 2011, at 00:36, Ian Holsman wrote:
> 
>> 
>> On Jun 9, 2011, at 10:04 PM, aaron morton wrote:
>> 
>>> I may be missing something but could you use a column for each of the last 48
hours all in the same row for a url ?
>>> 
>>> e.g. 
>>> {
>>> 	"/url.com/hourly" : {
>>> 		"20110609T01:00:00" : 456,
>>> 		"20110609T02:00:00" : 4567,
>>> 	}
>>> }
>> 
>> yes.. that would work better... I was storing all the different times in the same
row.
>> {
>> 	"/url.com" : {
>> 	 "H-20110609T01:00:00" : 456,
>> 	 "H-0110609T02:00:00" : 4567,
>> 	 "D-0110609" : 5678,
>> 	}
>> }
>> 
>> I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs
type query).. 
>> 
>>> 
>>> Increment the current hour only. Delete the older columns either when a read
detects there are old values or as a maintenance job. Or as part of writing values for the
first 5 minutes of any hour. 
>> 
>> yes.. I thought of that. The problem with doing it on read is there may be a case
where a old URL never gets read.. so it will just sit there taking up space.. the maintenance
job is the route I went down.
>> 
>>> 
>>> The row will get spread out over a lot of sstables which may reduce read speed.
If this is a problem consider a separate CF with more aggressive GC and compaction settings.

>> 
>> Thanks!
>>> 
>>> Cheers
>>> 
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 10 Jun 2011, at 09:28, Ian Holsman wrote:
>>> 
>>>> So would doing something like storing it in reverse (so I know what to delete)
work? Or is storing a million columns in a supercolumn impossible. 
>>>> 
>>>> I could always use a logfile and run the archiver off that as a worst case
I guess. 
>>>> Would doing so many deletes screw up the db/cause other problems?
>>>> 
>>>> ---
>>>> Ian Holsman - 703 879-3128
>>>> 
>>>> I saw the angel in the marble and carved until I set him free -- Michelangelo
>>>> 
>>>> On 09/06/2011, at 4:22 PM, Ryan King <ryan@twitter.com> wrote:
>>>> 
>>>>> On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman <hadoop@holsman.net>
wrote:
>>>>>> Hi Ryan.
>>>>>> you wouldn't have your version of cassandra up on github would you??
>>>>> 
>>>>> No, and the patch isn't in our version yet either. We're still working
on it.
>>>>> 
>>>>> -ryan
>>> 
>> 
> 


Mime
View raw message