> I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs type
query)..
AFAIK thats not a great application for counters. You would need range support in the secondary
indexes so you could get the first X rows ordered by a column value.
To be honest, depending on scale, I'd consider a sorted set in redis for that.
Hope that helps.
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 11 Jun 2011, at 00:36, Ian Holsman wrote:
>
> On Jun 9, 2011, at 10:04 PM, aaron morton wrote:
>
>> I may be missing something but could you use a column for each of the last 48 hours
all in the same row for a url ?
>>
>> e.g.
>> {
>> "/url.com/hourly" : {
>> "20110609T01:00:00" : 456,
>> "20110609T02:00:00" : 4567,
>> }
>> }
>
> yes.. that would work better... I was storing all the different times in the same row.
> {
> "/url.com" : {
> "H-20110609T01:00:00" : 456,
> "H-0110609T02:00:00" : 4567,
> "D-0110609" : 5678,
> }
> }
>
> I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs type
query)..
>
>>
>> Increment the current hour only. Delete the older columns either when a read detects
there are old values or as a maintenance job. Or as part of writing values for the first 5
minutes of any hour.
>
> yes.. I thought of that. The problem with doing it on read is there may be a case where
a old URL never gets read.. so it will just sit there taking up space.. the maintenance job
is the route I went down.
>
>>
>> The row will get spread out over a lot of sstables which may reduce read speed. If
this is a problem consider a separate CF with more aggressive GC and compaction settings.
>
> Thanks!
>>
>> Cheers
>>
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 10 Jun 2011, at 09:28, Ian Holsman wrote:
>>
>>> So would doing something like storing it in reverse (so I know what to delete)
work? Or is storing a million columns in a supercolumn impossible.
>>>
>>> I could always use a logfile and run the archiver off that as a worst case I
guess.
>>> Would doing so many deletes screw up the db/cause other problems?
>>>
>>> ---
>>> Ian Holsman - 703 879-3128
>>>
>>> I saw the angel in the marble and carved until I set him free -- Michelangelo
>>>
>>> On 09/06/2011, at 4:22 PM, Ryan King <ryan@twitter.com> wrote:
>>>
>>>> On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman <hadoop@holsman.net> wrote:
>>>>> Hi Ryan.
>>>>> you wouldn't have your version of cassandra up on github would you??
>>>>
>>>> No, and the patch isn't in our version yet either. We're still working on
it.
>>>>
>>>> -ryan
>>
>
|