cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Incognito <>
Subject Re: Mass deletion -- slowing down
Date Mon, 14 Nov 2011 08:01:18 GMT
i think what he means you know what day the 'oldest' day is?  eg 
if you have a rolling window of say 2 weeks, structure your query so 
that your slice range only goes back 2 weeks, rather than to the 
beginning of time.  this would avoid iterating over all the tombstones 
from prior to the 2 week window.  this wouldn't work if you are deleting 
arbitrary days in the middle of your date range.

On 14/11/2011 02:02, Maxim Potekhin wrote:
> Thanks Peter,
> I'm not sure I entirely follow. By the oldest data, do you mean the
> primary key corresponding to the limit of the time horizon? 
> Unfortunately,
> unique IDs and the timstamps do not correlate in the sense that 
> chronologically
> "newer" entries might have a smaller sequential ID. That's because the 
> timestamp
> corresponds to the last update that's stochastic in the sense that the 
> jobs can take
> from seconds to days to complete. As I said I'm not sure I understood you
> correctly.
> Also, I note that queries on different dates (i.e. not "contaminated" 
> with lots
> of tombstones) work just fine, which is consistent with the picture that
> emerged so far.
> Theoretically -- would compaction or cleanup help?
> Thanks
> Maxim
> On 11/13/2011 8:39 PM, Peter Schuller wrote:
>>> I do limit the number of rows I'm asking for in Pycassa. Queries on 
>>> primary
>>> keys still work fine,
>> Is it feasable in your situation to keep track of the oldest possible
>> data (for example, if there is a single sequential writer that rotates
>> old entries away it could keep a record of what the oldest might be)
>> so that you can bound your index lookup>= that value (and avoid the
>> tombstones)?

View raw message