Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of scode@scode.org designates
 74.125.82.44 as permitted sender)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <4EC0769D.6090402@bnl.gov>
References: <1302618388.3794.34.camel@mierdi-laptop>
	<4DA47CCD.50509@panasiangroup.com>
	<1302630144.1732.2.camel@Avalon>
	<4DA491EE.6010500@panasiangroup.com>
	<4EB84C9F.8040208@bnl.gov>
	<CAKkz8Q03bk1=moG__noMBV6zUAo+9i7ocLMQesWceUU9voc6=g@mail.gmail.com>
	<4EBC7AB8.7030105@bnl.gov>
	<CAO5xsd2DKqz7dv1zpp3G3g3wvquu5YiuD7oNhzjWUNqwjvCRgw@mail.gmail.com>
	<4EC066E7.9090707@bnl.gov>
	<CAAafH9RCwXbmd_2wYiwrR686Y4gmzewTf-5ZEzc_6NFCGxC9wA@mail.gmail.com>
	<4EC06E1B.9020905@bnl.gov>
	<CAO5xsd3fS6yAhBzFrZdEMQ7wCpveneBMrM0vH4V9jOKUvxzBFw@mail.gmail.com>
	<4EC0769D.6090402@bnl.gov>
Date: Sun, 13 Nov 2011 18:44:54 -0800
Message-ID: 
 <CAO5xsd3io0etbwO0gJaHpa+4vaQKKaN1e1xpQcLwP0hQju6wjw@mail.gmail.com>
Subject: Re: Mass deletion -- slowing down
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8

> I'm not sure I entirely follow. By the oldest data, do you mean the
> primary key corresponding to the limit of the time horizon? Unfortunately,
> unique IDs and the timstamps do not correlate in the sense that
> chronologically
> "newer" entries might have a smaller sequential ID. That's because the
> timestamp
> corresponds to the last update that's stochastic in the sense that the jobs
> can take
> from seconds to days to complete. As I said I'm not sure I understood you
> correctly.

I was hoping there would be a "wave of deletions" that matched the
order of the index (whatever is being read that is subject to the
tombstones). If not, then my suggestion doesn't apply. Are you using
cassandra secondary indexes or maintaining your own index btw?

> Theoretically -- would compaction or cleanup help?

Not directly. The only way to eliminate tombstones is for them to (1)
expire according to gc grace seconds (again see
http://wiki.apache.org/cassandra/DistributedDeletes) and then (2) for
compaction to remove them.

So while decreasing the gc grace period might mitigate it somewhat, I
would advise against going that route since it doesn't solve the
fundamental problem and it can be dangerous: gc grace has the usual
implications on how often anti-entropy/repair must be run, and a
cluster which is super-sensitive to a small grace time makes it a lot
more volatile if e.g. you have repair problems and must temporarily
increase gc grace.

It seems better to figure out some way of structuring the data that
the reads in question do not suffer from this problem.

Note that reading individual columns should still scale well despite
tombstones, as should slicing as long as the slices you're reading are
reasonably dense (in terms of data vs. tombstone ratio) even if
surrounding data is sparse.

How many entries are you reading per query? I have been presuming it's
the index read that is causing the timeout rather than the reading of
the individual matching columns, since the maximum "per column"
penalty when reading individual columns is finite, regardless of the
sparsity of the data.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)