incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Scalability question
Date Mon, 15 Aug 2011 20:27:49 GMT
This is more an artifact of repair's problems than compaction per se.
We're addressing these in
https://issues.apache.org/jira/browse/CASSANDRA-2816 and
https://issues.apache.org/jira/browse/CASSANDRA-2280.

On Mon, Aug 15, 2011 at 3:06 PM, Philippe <watcherfr@gmail.com> wrote:
>> It's another reason to avoid major / manual compactions which create a
>> single big SSTable. Minor compactions keep things in buckets   which means
>> newer SSTable can be compacted needing to read the bigger older tables.
>
> I've never run a major/manual compaction on this ring.
> In my case running repair on a "big" keyspace results in SSTables piling up.
> My problematic node just filled up 483GB (yes, GB) of SSTTables. Here are
> the biggest
> ls -laSrh
> (...)
>
> -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:13
> PUBLIC_MONTHLY_20-g-4581-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  2.7G 2011-08-15 14:52
> PUBLIC_MONTHLY_20-g-4641-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  2.8G 2011-08-15 14:39
> PUBLIC_MONTHLY_20-tmp-g-4878-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  2.9G 2011-08-15 15:00
> PUBLIC_MONTHLY_20-g-4656-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 14:17
> PUBLIC_MONTHLY_20-g-4599-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.0G 2011-08-15 15:11
> PUBLIC_MONTHLY_20-g-4675-Data.db
>
> -rw-r--r-- 3 cassandra cassandra  3.1G 2011-08-13 10:34
> PUBLIC_MONTHLY_18-g-3861-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.2G 2011-08-15 14:41
> PUBLIC_MONTHLY_20-tmp-g-4884-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.6G 2011-08-15 14:44
> PUBLIC_MONTHLY_20-tmp-g-4894-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:56
> PUBLIC_MONTHLY_20-tmp-g-4934-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  3.8G 2011-08-15 14:46
> PUBLIC_MONTHLY_20-tmp-g-4905-Data.db
>
> -rw-r--r-- 1 cassandra cassandra  4.0G 2011-08-15 14:57
> PUBLIC_MONTHLY_20-tmp-g-4935-Data.db
>
> -rw-r--r-- 3 cassandra cassandra  5.9G 2011-08-13 12:53
> PUBLIC_MONTHLY_19-g-4219-Data.db
>
> -rw-r--r-- 3 cassandra cassandra  6.0G 2011-08-13 13:57
> PUBLIC_MONTHLY_20-g-4538-Data.db
>
> -rw-r--r-- 3 cassandra cassandra   12G 2011-08-13 09:27
> PUBLIC_MONTHLY_20-g-4501-Data.db
>
> On the other nodes the same directory is around 69GB. Why are there so fewer
> large files there and so many big ones on the repairing node ?
>  -rw-r--r-- 1 cassandra cassandra 434M 2011-08-15 16:02
> PUBLIC_MONTHLY_17-g-3525-Data.db
> -rw-r--r-- 1 cassandra cassandra 456M 2011-08-15 15:50
> PUBLIC_MONTHLY_19-g-4253-Data.db
> -rw-r--r-- 1 cassandra cassandra 485M 2011-08-15 14:30
> PUBLIC_MONTHLY_20-g-5280-Data.db
> -rw-r--r-- 1 cassandra cassandra 572M 2011-08-15 15:15
> PUBLIC_MONTHLY_18-g-3774-Data.db
> -rw-r--r-- 2 cassandra cassandra 664M 2011-08-09 15:39
> PUBLIC_MONTHLY_20-g-4893-Index.db
> -rw-r--r-- 2 cassandra cassandra 811M 2011-08-11 21:27
> PUBLIC_MONTHLY_16-g-2597-Data.db
> -rw-r--r-- 2 cassandra cassandra 915M 2011-08-13 04:00
> PUBLIC_MONTHLY_18-g-3695-Data.db
> -rw-r--r-- 1 cassandra cassandra 925M 2011-08-15 03:39
> PUBLIC_MONTHLY_17-g-3454-Data.db
> -rw-r--r-- 1 cassandra cassandra 1.3G 2011-08-15 13:46
> PUBLIC_MONTHLY_19-g-4199-Data.db
> -rw-r--r-- 2 cassandra cassandra 1.5G 2011-08-10 15:37
> PUBLIC_MONTHLY_17-g-3218-Data.db
> -rw-r--r-- 1 cassandra cassandra 1.9G 2011-08-15 14:35
> PUBLIC_MONTHLY_20-g-5281-Data.db
> -rw-r--r-- 2 cassandra cassandra 2.1G 2011-08-10 16:33
> PUBLIC_MONTHLY_19-g-3946-Data.db
> -rw-r--r-- 2 cassandra cassandra 3.1G 2011-08-10 22:23
> PUBLIC_MONTHLY_18-g-3509-Data.db
> -rw-r--r-- 2 cassandra cassandra 4.0G 2011-08-10 18:18
> PUBLIC_MONTHLY_20-g-5024-Data.db
> -rw------- 2 cassandra cassandra 5.1G 2011-08-09 15:23
> PUBLIC_MONTHLY_19-g-3847-Data.db
> -rw-r--r-- 2 cassandra cassandra 9.6G 2011-08-09 15:39
> PUBLIC_MONTHLY_20-g-4893-Data.db
> This whole compaction thing is getting me worried : how are sites in
> production dealing with SSTables becoming larger and larger and thus taking
> longer and longer to compact ? Adding nodes every couple of weeks ?
> Philippe



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message