incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Potekhin <>
Subject Re: Mass deletion -- slowing down
Date Mon, 14 Nov 2011 02:02:05 GMT
Thanks Peter,

I'm not sure I entirely follow. By the oldest data, do you mean the
primary key corresponding to the limit of the time horizon? Unfortunately,
unique IDs and the timstamps do not correlate in the sense that 
"newer" entries might have a smaller sequential ID. That's because the 
corresponds to the last update that's stochastic in the sense that the 
jobs can take
from seconds to days to complete. As I said I'm not sure I understood you

Also, I note that queries on different dates (i.e. not "contaminated" 
with lots
of tombstones) work just fine, which is consistent with the picture that
emerged so far.

Theoretically -- would compaction or cleanup help?



On 11/13/2011 8:39 PM, Peter Schuller wrote:
>> I do limit the number of rows I'm asking for in Pycassa. Queries on primary
>> keys still work fine,
> Is it feasable in your situation to keep track of the oldest possible
> data (for example, if there is a single sequential writer that rotates
> old entries away it could keep a record of what the oldest might be)
> so that you can bound your index lookup>= that value (and avoid the
> tombstones)?

View raw message