Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of jonathan.colby@gmail.com
 designates 74.125.82.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=from:mime-version:content-type:subject:date:in-reply-to:to
         :references:message-id:x-mailer;
        b=foV+YT2qxhSjamQt0KKiM+oM+C2wqVEA+eggusnCJYkyZIbdERreMxFFdMYFE7PASu
         tY4gdEj2DvhgBUEwwAVY2xzuwSPRMGgg79AbJNdiqYkxIXdafkdtF9R/KxtpDBHD7x46
         1jHtz0N93qjk9X1MVBy9CKVwSrSWIKV/3v+yI=
From: Jonathan Colby <jonathan.colby@gmail.com>
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: multipart/alternative; boundary=Apple-Mail-3--257485195
Subject: Re: simple question about merged SSTable sizes
Date: Wed, 22 Jun 2011 19:03:16 +0200
In-Reply-To: <BANLkTi=BU6Nq0vM2tKM-NOB+fGA_H5Sx5w@mail.gmail.com>
To: user@cassandra.apache.org
References: <CB52C501-2ECD-4B43-9C5A-6E053F2B2806@gmail.com>
 <BANLkTi=BU6Nq0vM2tKM-NOB+fGA_H5Sx5w@mail.gmail.com>
Message-Id: <16FEB70B-119E-4377-B993-8D8F39712444@gmail.com>


--Apple-Mail-3--257485195
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

So the take-away is try to avoid major compactions at all costs!   =
Thanks Ed and Eric.

On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote:

> Yes, if you are not deleting fast enough they will grow. This is not =
specifically a cassandra problem /var/log/messages has the same issue.=20=

>=20
> There is a JIRA ticket about having a maximum size for SSTables, so =
they always stay manageable
>=20
> You fall into a small trap when you force major compaction in that =
many small tables turn into one big one, from their it is hard to get =
back to many smaller ones again, the other side of the coin if you do =
not major compact you can end up with much more disk usage then live =
data (IE large % of disk is overwrites and tombstones).
>=20
> You can tune the compaction rate now so compaction does not kill your =
IO. Generally I think avoiding really large SSTables is the best way to =
do. Scale out and avoid very large SSTables/node if possible.
>=20
> Edward
>=20
>=20
> On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby =
<jonathan.colby@gmail.com> wrote:
>=20
> The way compaction works,  "x" same-sized files are merged into a new =
SSTable.  This repeats itself and the SSTable get bigger and bigger.
>=20
> So what is the upper limit??     If you are not deleting stuff fast =
enough, wouldn't the SSTable sizes grow indefinitely?
>=20
> I ask because we have some rather large SSTable files (80-100 GB) and =
I'm starting to worry about future compactions.
>=20
> Second, compacting such large files is an IO killer.    What can be =
tuned other than compaction_threshold to help optimize this and prevent =
the files from getting too big?
>=20
> Thanks!
>=20


--Apple-Mail-3--257485195
Content-Transfer-Encoding: 7bit
Content-Type: text/html;
	charset=us-ascii

<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">So the take-away is try to avoid major compactions at all costs! &nbsp; Thanks Ed and Eric.<div><br><div><div>On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Yes, if you are not deleting fast enough they will grow. This is not specifically a cassandra problem /var/log/messages has the same issue. <br><br>There is a JIRA ticket about having a maximum size for SSTables, so they always stay manageable<br>
<br>You fall into a small trap when you force major compaction in that many small tables turn into one big one, from their it is hard to get back to many smaller ones again, the other side of the coin if you do not major compact you can end up with much more disk usage then live data (IE large % of disk is overwrites and tombstones).<br>
<br>You can tune the compaction rate now so compaction does not kill your IO. Generally I think avoiding really large SSTables is the best way to do. Scale out and avoid very large SSTables/node if possible.<br><br>Edward<br>
<br><br><div class="gmail_quote">On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby <span dir="ltr">&lt;<a href="mailto:jonathan.colby@gmail.com">jonathan.colby@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
The way compaction works, &nbsp;"x" same-sized files are merged into a new SSTable. &nbsp;This repeats itself and the SSTable get bigger and bigger.<br>
<br>
So what is the upper limit?? &nbsp; &nbsp; If you are not deleting stuff fast enough, wouldn't the SSTable sizes grow indefinitely?<br>
<br>
I ask because we have some rather large SSTable files (80-100 GB) and I'm starting to worry about future compactions.<br>
<br>
Second, compacting such large files is an IO killer. &nbsp; &nbsp;What can be tuned other than compaction_threshold to help optimize this and prevent the files from getting too big?<br>
<br>
Thanks!</blockquote></div><br>
</blockquote></div><br></div></body></html>
--Apple-Mail-3--257485195--