Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=tOWtUfQ2Il
	tdFxJZ1f/yrlsxOVJBTKVstpoic4nSOzWRBavU6npW+5NvFOuNdrEbpVfG+m7g37
	T7m43Amm5G/laQe73g+DWr2T35B+JSJnRfjp/pgzaMDXTem1oI0bilkh8ew0VhD3
	45j/BrTuTnWzEZwDUARfsraW4yG9NTV10=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1244.3)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_AADA360B-3B11-4101-A42F-4AE7194AC533"
Subject: Re: Compaction and total disk space used for highly overwritten CF
Date: Thu, 6 Oct 2011 22:13:48 +1300
In-Reply-To: <9C47878B-4C57-42F8-906A-0EBAFCD04817@lacunasystems.com>
To: user@cassandra.apache.org
References: <9C47878B-4C57-42F8-906A-0EBAFCD04817@lacunasystems.com>
Message-Id: <4852F205-DD07-4BB3-8E6E-013E3CCB0632@thelastpickle.com>


--Apple-Mail=_AADA360B-3B11-4101-A42F-4AE7194AC533
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

You will only have tombstones in your data if you issue deletes.

What you are seeing is an artifact of the fundamental way Cassandra =
stores data. Once data is written to disk it is never modified. If you =
overwrite a column value that has already been committed to disk the old =
value is not changed. Instead the new value is held in memory and some =
time later it is written to a new file (more info here =
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/)

Compaction not only kersplats data that has been deleted, it kapows data =
that has been over written. (See this link for a dramatic first person =
re-creation of compaction removing an overwritten value =
http://goo.gl/4TrB6 )
=20
By overwriting all the data so often you are somewhat fighting against =
the server But there are some things you can try (am assuming 0.8.6, =
some general background =
http://www.datastax.com/docs/0.8/operations/tuning)

* reduce the min_compaction_threshold on the CF so that data on disk =
gets compacted more frequently.=20
* look at the logs to too see why / when memtables are been flushed, =
look for lines like=20
=20
	INFO [ScheduledTasks:1] 2011-10-02 22:32:20,092 =
ColumnFamilyStore.java (line 1128) Enqueuing flush of =
Memtable-NoCache_Ascending@921142878(2175000/13267958 serialized/live =
bytes, 43500 ops)
  	or
	WARN [ScheduledTasks:1] 2011-10-02 22:32:20,084 GCInspector.java =
(line 143) Heap is 0.778906484049155 full. You may need to reduce =
memtable and/or cache sizes. Cassandra will now flush up to the two =
largest memtables to free up memory. Adjust flush_largest_memtables_at =
threshold in cassandra.yaml if you don't want Cassandra to do this =
automatically

* The memtable will be flushed to disk for 1 of 3 reasons:
	* The Heap is too full and cassandra wants to free memory
	* It has passed the memtable_operations CF threshold for =
changes, increase this value to flush less
	* It has passed the memtable_throughput CF threshold for =
throughput, increase this value to flush less
	(background =
http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/)

* is possible reduce the amount of overwrites. =20

Hope that helps.=20

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/10/2011, at 2:42 PM, Derek Andree wrote:

> We have a very hot CF which we use essentially as a durable memory =
cache for our application.  It is about 70MBytes in size after being =
fully populated.  We completely overwrite this entire CF every few =
minutes (not delete).  Our hope was that the CF would stay around 70MB =
in size, but it grows to multiple Gigabytes in size rather quickly (less =
than an hour).  I've heard that doing major compactions using nodetool =
is no longer recommended, but when we force a compaction on this CF =
using nodetool compact, then perform GC, size on disk shrinks to the =
expected 70MB.
>=20
> I'm wondering if we are doing something wrong here, we thought we were =
avoiding tombstones since we are just overwriting each column using the =
same keys.  Is the fact that we have to do a GC to get the size on disk =
to shrink significantly a smoking gun that we have a bunch of =
tombstones?
>=20
> We've row cached the entire CF to make reads really fast, and writes =
are definitely fast enough, it's this growing disk space that has us =
concerned.
>=20
> Here's the output from nodetool cfstats for the CF in question (hrm, I =
just noticed that we still have a key cache for this CF which is rather =
dumb):
>=20
> 		Column Family: Test
> 		SSTable count: 4
> 		Space used (live): 309767193
> 		Space used (total): 926926841
> 		Number of Keys (estimate): 275456
> 		Memtable Columns Count: 37510
> 		Memtable Data Size: 15020598
> 		Memtable Switch Count: 22
> 		Read Count: 4827496
> 		Read Latency: 0.010 ms.
> 		Write Count: 1615946
> 		Write Latency: 0.095 ms.
> 		Pending Tasks: 0
> 		Key cache capacity: 150000
> 		Key cache size: 55762
> 		Key cache hit rate: 0.030557854052177317
> 		Row cache capacity: 150000
> 		Row cache size: 68752
> 		Row cache hit rate: 1.0
> 		Compacted row minimum size: 925
> 		Compacted row maximum size: 1109
> 		Compacted row mean size: 1109
>=20
>=20
> Any insight appreciated.
>=20
> Thanks,
> -Derek
>=20


--Apple-Mail=_AADA360B-3B11-4101-A42F-4AE7194AC533
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">You =
will only have tombstones in your data if you issue =
deletes.<div><br></div><div>What you are seeing is an artifact of the =
fundamental way Cassandra stores data. Once data is written to disk it =
is never modified. If you overwrite a column value that has already been =
committed to disk the old value is not changed. Instead the new value is =
held in memory and some time later it is written to a new file (more =
info here&nbsp;<a =
href=3D"http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/">htt=
p://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/</a>)</div><div>=
<br></div><div>Compaction not only kersplats data that has been deleted, =
it kapows data that has been over written. (See this link for a dramatic =
first person re-creation of compaction removing an overwritten value <a =
href=3D"http://goo.gl/4TrB6">http://goo.gl/4TrB6</a> =
)</div><div>&nbsp;</div><div>By overwriting all the data so often you =
are somewhat fighting against the server But there are some things you =
can try (am assuming 0.8.6, some general background&nbsp;<a =
href=3D"http://www.datastax.com/docs/0.8/operations/tuning">http://www.dat=
astax.com/docs/0.8/operations/tuning</a>)</div><div><br></div><div>* =
reduce the min_compaction_threshold on the CF so that data on disk gets =
compacted more frequently.&nbsp;</div><div>* look at the logs to too see =
why / when memtables are been flushed, look for lines =
like&nbsp;</div><div>&nbsp;</div><div><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>INFO [ScheduledTasks:1] =
2011-10-02 22:32:20,092 ColumnFamilyStore.java (line 1128) Enqueuing =
flush of Memtable-NoCache_Ascending@921142878(2175000/13267958 =
serialized/live bytes, 43500 ops)</div><div>&nbsp;&nbsp;<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>or</div><div><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>WARN [ScheduledTasks:1] =
2011-10-02 22:32:20,084 GCInspector.java (line 143) Heap is =
0.778906484049155 full.  You may need to reduce memtable and/or cache =
sizes.  Cassandra will now flush up to the two largest memtables to free =
up memory.  Adjust flush_largest_memtables_at threshold in =
cassandra.yaml if you don't want Cassandra to do this =
automatically</div><div><br></div><div>* The memtable will be flushed to =
disk for 1 of 3 reasons:</div><div><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>* The Heap is too full and =
cassandra wants to free memory</div><div><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>* It has passed =
the&nbsp;memtable_operations CF threshold for changes, increase this =
value to flush less</div><div><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>* It has passed =
the&nbsp;memtable_throughput CF threshold for throughput, increase this =
value to flush less</div><div><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>(background&nbsp;<a =
href=3D"http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/">h=
ttp://thelastpickle.com/2011/05/04/How-are-Memtables-measured/</a>)</div><=
div><br></div><div>* is possible reduce the amount of overwrites. =
&nbsp;</div><div><br></div><div>Hope that =
helps.&nbsp;</div><div><br><div>
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; =
text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Cassandra Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></span>
</div>

<br><div><div>On 6/10/2011, at 2:42 PM, Derek Andree wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div>We =
have a very hot CF which we use essentially as a durable memory cache =
for our application. &nbsp;It is about 70MBytes in size after being =
fully populated. &nbsp;We completely overwrite this entire CF every few =
minutes (not delete). &nbsp;Our hope was that the CF would stay around =
70MB in size, but it grows to multiple Gigabytes in size rather quickly =
(less than an hour). &nbsp;I've heard that doing major compactions using =
nodetool is no longer recommended, but when we force a compaction on =
this CF using nodetool compact, then perform GC, size on disk shrinks to =
the expected 70MB.<br><br>I'm wondering if we are doing something wrong =
here, we thought we were avoiding tombstones since we are just =
overwriting each column using the same keys. &nbsp;Is the fact that we =
have to do a GC to get the size on disk to shrink significantly a =
smoking gun that we have a bunch of tombstones?<br><br>We've row cached =
the entire CF to make reads really fast, and writes are definitely fast =
enough, it's this growing disk space that has us =
concerned.<br><br>Here's the output from nodetool cfstats for the CF in =
question (hrm, I just noticed that we still have a key cache for this CF =
which is rather dumb):<br><br><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Column Family: Test<br><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>SSTable =
count: 4<br><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>Space used (live): 309767193<br><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Space used (total): =
926926841<br><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>Number of Keys (estimate): 275456<br><span class=3D"Apple-tab-span"=
 style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Memtable Columns Count: =
37510<br><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>Memtable Data Size: 15020598<br><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Memtable Switch Count: =
22<br><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>Read Count: 4827496<br><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Read Latency: 0.010 ms.<br><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>Write =
Count: 1615946<br><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Write Latency: 0.095 ms.<br><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>Pending =
Tasks: 0<br><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>Key cache capacity: 150000<br><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Key cache size: 55762<br><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>Key cache =
hit rate: 0.030557854052177317<br><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Row cache capacity: =
150000<br><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>Row cache size: 68752<br><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Row cache hit rate: 1.0<br><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span><span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>Compacted =
row minimum size: 925<br><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>Compacted row maximum size: =
1109<br><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span><span class=3D"Apple-tab-span" style=3D"white-space:pre">	=
</span>Compacted row mean size: 1109<br><br><br>Any insight =
appreciated.<br><br>Thanks,<br>-Derek<br><br></div></blockquote></div><br>=
</div></body></html>=

--Apple-Mail=_AADA360B-3B11-4101-A42F-4AE7194AC533--