cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Riccardo Ferrari <ferra...@gmail.com>
Subject Re: DTCS SSTable count issue
Date Mon, 11 Jul 2016 20:35:52 GMT
@Alain, @Jeff

Thank you very much for your time. I really appreciate it!

Yes I found many posts/hints about TWCS, definitely look very promising. I
understand correctly that I can swap compaction strategy without any major
concern, right?

About the read repair, Am I correct in thinking that the read repair in
controlled by both options: 'read_repair_chance' and
'dclocal_read_repair_chance'.
If that is the case I see that I still have read repair turned on...

Best!

On Mon, Jul 11, 2016 at 10:05 PM, Alain RODRIGUEZ <arodrime@gmail.com>
wrote:

> @Jeff
>
> Rather than being an alternative, isn't your compaction strategy going to
> deprecate (and finally replace) DTCS ? That was my understanding from the
> ticket CASSANDRA-9666.
>
> @Riccardo
>
> If you are interested in TWCS from Jeff, I believe it has been introduced
> in 3.0.8 actually, not 3.0.7
> https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28.
> Anyway, you can use it in any recent version as compactions strategies are
> pluggable.
>
> What concerns me is that I have an high tombstone read count despite those
>> are insert only tables. Compacting the table make the tombstone issue
>> disappear. Yes, we are using TTL to expire data after 3 months and I have
>> not touch the GC grace period.
>>
>
> I observed the same issue recently and I am confident that TWCS will solve
> this tombstone issue, but it is not tested on my side so far. Meanwhile, be
> sure you have disabled any "read repair" on tables using DTCS and maybe
> hints as well. It is a hard decision to take as you'll loose 2 out of 3
> anti entropy systems, but DTCS behaves badly with those options turned on
> (TWCS is fine with it). The last anti-entropy being a full repair that you
> might already not be running as you only do inserts...
>
> Also instead of major compactions (which comes with its set of issues /
> tradeoffs too) you can think of a script smartly using sstablemetadata to
> find the sstables holding too much tombstones and running single SSTable
> compactions on them through JMX and user defined compactions. Meanwhile if
> you want to do it manually, you could do it with something like this to
> know the tombstone ratio from the biggest sstable:
>
> du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du
> -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs
> sstablemetadata | grep tombstones
> And something like this to run a user defined compaction on the ones you
> chose (big sstable with high tombstone ratio):
>
> echo "run -b org.apache.cassandra.db:type=CompactionManager
> forceUserDefinedCompaction <Data_db_file_name_without_path>" | java -jar
> jmxterm-version.jar -l <ip>:<jmx_port>
>
> *note:* you have to download jmxterm (or use any other jmx tool).
>
>
> Did you give a try to the unchecked_tombstone_compaction as well
> (compaction options at the table level)? Feel free to set this one to true.
> I think it could be the default. It is safe as long as your machines have
> some more resources available (not that much). That's the first thing I
> would do.
>
>
> Also if you use TTL only, feel free to reduce the gc_grace_seconds, this
> will probably help having tombstones removed. I would start with other
> solutions first. Keep in mind that if someday you perform deletes, this
> setting could produce you some Zombies (data coming back), if you don't run
> repair in the gc_grace_seconds for the entire ring.
>
> C*heers,
>
> -----------------------
>
> Alain Rodriguez - alain@thelastpickle.com
>
> France
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
> 2016-07-07 19:25 GMT+02:00 Jeff Jirsa <jeff.jirsa@crowdstrike.com>:
>
>> 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow
>> over time, but ideally data will expire as it nears your 90 day TTL and
>> those tables should start dropping away as they age.
>>
>>
>>
>> 3.0.7 introduces an alternative to DTCS you may find easier to use called
>> TWCS. It will almost certainly help address the growing sstable count.
>>
>>
>>
>>
>>
>>
>>
>> *From: *Riccardo Ferrari <ferrarir@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Thursday, July 7, 2016 at 6:49 AM
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *DTCS SSTable count issue
>>
>>
>>
>> Hi everyone,
>>
>>
>>
>> This is my first question, apologize may I do something wrong.
>>
>>
>>
>> I have a small Cassandra cluster build upon 3 nodes. Originally born as
>> 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4
>> recently 3.0.6. Ubuntu is the OS.
>>
>>
>>
>> There are few tables that have DateTieredCompactionStrategy and are
>> suffering of constantly growing SSTable count. I have the feeling this has
>> something to do with the upgrade however I need some hint on how to debug
>> this issue.
>>
>>
>>
>> Tables are created like:
>>
>> CREATE TABLE <table> (
>>
>>  ...
>>
>> PRIMARY KEY (...)
>>
>> ) WITH CLUSTERING ORDER BY (...)
>>
>>     AND bloom_filter_fp_chance = 0.01
>>
>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>
>>     AND comment = ''
>>
>>     AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>>
>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>
>>     AND crc_check_chance = 1.0
>>
>>     AND dclocal_read_repair_chance = 0.1
>>
>>     AND default_time_to_live = 7776000
>>
>>     AND gc_grace_seconds = 864000
>>
>>     AND max_index_interval = 2048
>>
>>     AND memtable_flush_period_in_ms = 0
>>
>>     AND min_index_interval = 128
>>
>>     AND read_repair_chance = 0.0
>>
>>     AND speculative_retry = '99PERCENTILE';
>>
>>
>>
>> and this is the "nodetool cfstats" output for that table:
>>
>> Read Count: 39
>>
>> Read Latency: 85.03307692307692 ms.
>>
>> Write Count: 9845275
>>
>> Write Latency: 0.09604882382665797 ms.
>>
>> Pending Flushes: 0
>>
>> Table: <table>
>>
>> SSTable count: 48
>>
>> Space used (live): 19566109394
>>
>> Space used (total): 19566109394
>>
>> Space used by snapshots (total): 109796505570
>>
>> Off heap memory used (total): 11317941
>>
>> SSTable Compression Ratio: 0.22632301701483284
>>
>> Number of keys (estimate): 2557
>>
>> Memtable cell count: 0
>>
>> Memtable data size: 0
>>
>> Memtable off heap memory used: 0
>>
>> Memtable switch count: 828
>>
>> Local read count: 39
>>
>> Local read latency: 93.051 ms
>>
>> Local write count: 9845275
>>
>> Local write latency: 0.106 ms
>>
>> Pending flushes: 0
>>
>> Bloom filter false positives: 2
>>
>> Bloom filter false ratio: 0.00000
>>
>> Bloom filter space used: 10200
>>
>> Bloom filter off heap memory used: 9816
>>
>> Index summary off heap memory used: 4677
>>
>> Compression metadata off heap memory used: 11303448
>>
>> Compacted partition minimum bytes: 150
>>
>> Compacted partition maximum bytes: 4139110981
>>
>> Compacted partition mean bytes: 13463937
>>
>> Average live cells per slice (last five minutes): 59.69230769230769
>>
>> Maximum live cells per slice (last five minutes): 149
>>
>> Average tombstones per slice (last five minutes): 8.564102564102564
>>
>> Maximum tombstones per slice (last five minutes): 42
>>
>>
>>
>> According to the "nodetool compactionhistory <keyspace>.<table>"
>>
>> the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT"
>>
>> and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY)
>>
>>
>>
>> However the table count is still very high compared to tables that have a
>> different compaction strategy. If I run a "nodetool compact <table>" the
>> SSTable count decrease dramatically to a reasonable number.
>>
>> I read many articles including:
>> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_datetieredcompactionstrategy&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=35ADGtvp3nLmSgTuemeQ5e3RIubiM_mbcWLyBbv6DEo&s=_1xjcAR70HQlYtx4geGugprQxrSNw2EaiSjeSWm2CJ4&e=>
>> however I can not really tell if this is an expected behavior.
>>
>> What concerns me is that I have an high tombstone read count despite
>> those are insert only tables. Compacting the table make the tombstone issue
>> disappear. Yes, we are using TTL to expire data after 3 months and I have
>> not touch the GC grace period.
>>
>> Looking at the file system I see the very first *-Data.db file that is
>> 15GB then there are all the other 43 *-Data.db files that are ranging from
>> 50 to 150MB in size.
>>
>>
>>
>> How can I debug this mis-compaction issue? Any help is much appreciated
>>
>> Best,
>>
>
>

Mime
View raw message