cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason J. W. Williams" <jasonjwwilli...@gmail.com>
Subject Re: DTCS SSTable count issue
Date Mon, 11 Jul 2016 20:57:10 GMT
I can vouch for TWCS...we switched from DTCS to TWCS using Jeff's plugin w/
Cassandra 3.0.5 and just upgraded to 3.0.8 today and switched over to the
built-in version of TWCS.

-J

On Mon, Jul 11, 2016 at 1:38 PM, Jeff Jirsa <jeff.jirsa@crowdstrike.com>
wrote:

> DTCS is deprecated in favor of TWCS in new versions, yes.
>
>
>
> Worth mentioning that you can NOT disable blocking read repair which comes
> naturally if you use CL > ONE.
>
>
>
> >  Also instead of major compactions (which comes with its set of issues
> / tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
>
>
> The tombstone compaction options basically do this for you for the right
> settings (unchecked tombstone compaction = true, set threshold to 85% or
> so, don’t try to get clever and set it to something very close to 99%, the
> estimated tombstone ratio isn’t that accurate)
>
>
>
> -          Jeff
>
>
>
>
>
> *From: *Alain RODRIGUEZ <arodrime@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, July 11, 2016 at 1:05 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: DTCS SSTable count issue
>
>
>
> @Jeff
>
>
>
> Rather than being an alternative, isn't your compaction strategy going to
> deprecate (and finally replace) DTCS ? That was my understanding from the
> ticket CASSANDRA-9666.
>
> @Riccardo
>
>
>
> If you are interested in TWCS from Jeff, I believe it has been introduced
> in 3.0.8 actually, not 3.0.7
> https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_cassandra-2D3.0_CHANGES.txt-23L28&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=YH_8oul7dFVkpBLW_2oTDIMju6au0aZNERq2is-d7Ug&s=AqctrVapUKAr-AuBiB520RaDRjkh0YQcR-Ze4CPQWIw&e=>.
> Anyway, you can use it in any recent version as compactions strategies are
> pluggable.
>
>
>
> What concerns me is that I have an high tombstone read count despite those
> are insert only tables. Compacting the table make the tombstone issue
> disappear. Yes, we are using TTL to expire data after 3 months and I have
> not touch the GC grace period.
>
>
>
> I observed the same issue recently and I am confident that TWCS will solve
> this tombstone issue, but it is not tested on my side so far. Meanwhile, be
> sure you have disabled any "read repair" on tables using DTCS and maybe
> hints as well. It is a hard decision to take as you'll loose 2 out of 3
> anti entropy systems, but DTCS behaves badly with those options turned on
> (TWCS is fine with it). The last anti-entropy being a full repair that you
> might already not be running as you only do inserts...
>
>
>
> Also instead of major compactions (which comes with its set of issues /
> tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
> du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du
> -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs
> sstablemetadata | grep tombstones
>
> And something like this to run a user defined compaction on the ones you
> chose (big sstable with high tombstone ratio):
>
> echo "run -b org.apache.cassandra.db:type=CompactionManager
> forceUserDefinedCompaction <Data_db_file_name_without_path>" | java -jar
> jmxterm-version.jar -l <ip>:<jmx_port>
>
> *note:* you have to download jmxterm (or use any other jmx tool).
>
>
>
> Did you give a try to the unchecked_tombstone_compaction as well
> (compaction options at the table level)? Feel free to set this one to true.
> I think it could be the default. It is safe as long as your machines have
> some more resources available (not that much). That's the first thing I
> would do.
>
>
>
> Also if you use TTL only, feel free to reduce the gc_grace_seconds, this
> will probably help having tombstones removed. I would start with other
> solutions first. Keep in mind that if someday you perform deletes, this
> setting could produce you some Zombies (data coming back), if you don't run
> repair in the gc_grace_seconds for the entire ring.
>
> C*heers,
>
> -----------------------
>
> Alain Rodriguez - alain@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=YH_8oul7dFVkpBLW_2oTDIMju6au0aZNERq2is-d7Ug&s=7arVRTINYZivmy46OVP376O-ZUbNV6Z5uUs1ROprAD4&e=>
>
>
>
> 2016-07-07 19:25 GMT+02:00 Jeff Jirsa <jeff.jirsa@crowdstrike.com>:
>
> 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow
> over time, but ideally data will expire as it nears your 90 day TTL and
> those tables should start dropping away as they age.
>
>
>
> 3.0.7 introduces an alternative to DTCS you may find easier to use called
> TWCS. It will almost certainly help address the growing sstable count.
>
>
>
>
>
>
>
> *From: *Riccardo Ferrari <ferrarir@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Thursday, July 7, 2016 at 6:49 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *DTCS SSTable count issue
>
>
>
> Hi everyone,
>
>
>
> This is my first question, apologize may I do something wrong.
>
>
>
> I have a small Cassandra cluster build upon 3 nodes. Originally born as
> 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4
> recently 3.0.6. Ubuntu is the OS.
>
>
>
> There are few tables that have DateTieredCompactionStrategy and are
> suffering of constantly growing SSTable count. I have the feeling this has
> something to do with the upgrade however I need some hint on how to debug
> this issue.
>
>
>
> Tables are created like:
>
> CREATE TABLE <table> (
>
>  ...
>
> PRIMARY KEY (...)
>
> ) WITH CLUSTERING ORDER BY (...)
>
>     AND bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.1
>
>     AND default_time_to_live = 7776000
>
>     AND gc_grace_seconds = 864000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
>
>
>
> and this is the "nodetool cfstats" output for that table:
>
> Read Count: 39
>
> Read Latency: 85.03307692307692 ms.
>
> Write Count: 9845275
>
> Write Latency: 0.09604882382665797 ms.
>
> Pending Flushes: 0
>
> Table: <table>
>
> SSTable count: 48
>
> Space used (live): 19566109394
>
> Space used (total): 19566109394
>
> Space used by snapshots (total): 109796505570
>
> Off heap memory used (total): 11317941
>
> SSTable Compression Ratio: 0.22632301701483284
>
> Number of keys (estimate): 2557
>
> Memtable cell count: 0
>
> Memtable data size: 0
>
> Memtable off heap memory used: 0
>
> Memtable switch count: 828
>
> Local read count: 39
>
> Local read latency: 93.051 ms
>
> Local write count: 9845275
>
> Local write latency: 0.106 ms
>
> Pending flushes: 0
>
> Bloom filter false positives: 2
>
> Bloom filter false ratio: 0.00000
>
> Bloom filter space used: 10200
>
> Bloom filter off heap memory used: 9816
>
> Index summary off heap memory used: 4677
>
> Compression metadata off heap memory used: 11303448
>
> Compacted partition minimum bytes: 150
>
> Compacted partition maximum bytes: 4139110981
>
> Compacted partition mean bytes: 13463937
>
> Average live cells per slice (last five minutes): 59.69230769230769
>
> Maximum live cells per slice (last five minutes): 149
>
> Average tombstones per slice (last five minutes): 8.564102564102564
>
> Maximum tombstones per slice (last five minutes): 42
>
>
>
> According to the "nodetool compactionhistory <keyspace>.<table>"
>
> the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT"
>
> and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY)
>
>
>
> However the table count is still very high compared to tables that have a
> different compaction strategy. If I run a "nodetool compact <table>" the
> SSTable count decrease dramatically to a reasonable number.
>
> I read many articles including:
> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_datetieredcompactionstrategy&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=35ADGtvp3nLmSgTuemeQ5e3RIubiM_mbcWLyBbv6DEo&s=_1xjcAR70HQlYtx4geGugprQxrSNw2EaiSjeSWm2CJ4&e=>
> however I can not really tell if this is an expected behavior.
>
> What concerns me is that I have an high tombstone read count despite those
> are insert only tables. Compacting the table make the tombstone issue
> disappear. Yes, we are using TTL to expire data after 3 months and I have
> not touch the GC grace period.
>
> Looking at the file system I see the very first *-Data.db file that is
> 15GB then there are all the other 43 *-Data.db files that are ranging from
> 50 to 150MB in size.
>
>
>
> How can I debug this mis-compaction issue? Any help is much appreciated
>
> Best,
>
>
>

Mime
View raw message