cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Global TTL vs Insert TTL
Date Wed, 01 Feb 2017 18:39:31 GMT
I was referring to this JIRA
https://issues.apache.org/jira/browse/CASSANDRA-3974 when talking about
dropping entire SSTable at compaction time

But the JIRA is pretty old and it is very possible that the optimization is
no longer there



On Wed, Feb 1, 2017 at 6:53 PM, Jonathan Haddad <jon@jonhaddad.com> wrote:

> This is incorrect, there's no optimization used that references the table
> level TTL setting.   The max local deletion time is stored in table
> metadata.  See org.apache.cassandra.io.sstable.metadata.StatsMetadata#maxLocalDeletionTime
> in the Cassandra 3.0 branch.    The default ttl is stored
> here: org.apache.cassandra.schema.TableParams#defaultTimeToLive and is
> never referenced during compaction.
>
> Here's an example from a table I created without a default TTL, you can
> use the sstablemetadata tool to see:
>
> jhaddad@rustyrazorblade ~/dev/cassandra/data/data/test$
> ../../../tools/bin/sstablemetadata a-7bca6b50e8a511e6869a5596edf4dd
> 35/mc-1-big-Data.db
> .....
> SSTable max local deletion time: 1485980862
>
> On Wed, Feb 1, 2017 at 6:59 AM DuyHai Doan <doanduyhai@gmail.com> wrote:
>
>> Global TTL is better than dynamic runtime TTL
>>
>> Why ?
>>
>>  Because Global TTL is a table property and Cassandra can perform
>> optimization when compacting.
>>
>> For example if it can see than the maxTimestamp of an SSTable is older
>> than the table Global TTL, the SSTable can be entirely dropped during
>> compaction
>>
>> Using dynamic TTL at runtime, since Cassandra doesn't how and cannot
>> track each individual TTL value, the previous optimization is not possible
>> (even if you always use the SAME TTL for all query, Cassandra is not
>> supposed to know that)
>>
>>
>>
>> On Wed, Feb 1, 2017 at 3:01 PM, Cogumelos Maravilha <
>> cogumelosmaravilha@sapo.pt> wrote:
>>
>> Thank you all, for your answers.
>>
>> On 02/01/2017 01:06 PM, Carlos Rolo wrote:
>>
>> To reinforce Alain statement:
>>
>> "I would say that the unsafe part is more about using C* 3.9" this is
>> key. You would be better on 3.0.x unless you need features on the 3.x
>> series.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
>> *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Mobile: +351 918 918 100 <+351%20918%20918%20100>
>> www.pythian.com
>>
>> On Wed, Feb 1, 2017 at 8:32 AM, Alain RODRIGUEZ <arodrime@gmail.com>
>> wrote:
>>
>> Is it safe to use TWCS in C* 3.9?
>>
>>
>> I would say that the unsafe part is more about using C* 3.9 than using
>> TWCS in C*3.9 :-). I see no reason to say 3.9 would be specifically unsafe
>> in C*3.9, but I might be missing something.
>>
>> Going from STCS to TWCS is often smooth, from LCS you might expect an
>> extra load compacting a lot (all?) of the SSTable from what we saw from the
>> field. In this case, be sure that your compaction options are safe enough
>> to handle this.
>>
>> TWCS is even easier to use on C*3.0.8+ and C*3.8+ as it became the new
>> default replacing TWCS, so no extra jar is needed, you can enable TWCS as
>> any other default compaction strategy.
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - @arodream - alain@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2017-01-31 23:29 GMT+01:00 Cogumelos Maravilha <
>> cogumelosmaravilha@sapo.pt>:
>>
>> Hi Alain,
>>
>> Thanks for your response and the links.
>>
>> I've also checked "Time series data model and tombstones".
>>
>> Is it safe to use TWCS in C* 3.9?
>>
>> Thanks in advance.
>>
>> On 31-01-2017 11:27, Alain RODRIGUEZ wrote:
>>
>> Is there a overhead using line by line option or wasted disk space?
>>
>>  There is a very recent topic about that in the mailing list, look for "Time
>> series data model and tombstones". I believe DuyHai answer your question
>> there with more details :).
>>
>> *tl;dr:*
>>
>> Yes, if you know the TTL in advance, and it is fixed, you might want to
>> go with the table option instead of adding the TTL in each insert. Also you
>> might want consider using TWCS compaction strategy.
>>
>> Here are some blogposts my coworkers recently wrote about TWCS, it might
>> be useful:
>>
>> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
>> http://thelastpickle.com/blog/2017/01/10/twcs-part2.html
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - @arodream - alain@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> 2017-01-31 10:43 GMT+01:00 Cogumelos Maravilha <
>> cogumelosmaravilha@sapo.pt>:
>>
>> Hi I'm just wondering what option is fastest:
>>
>> Global:*create table xxx (.....**AND **default_time_to_live = **XXX**;**
>> and**UPDATE xxx USING TTL XXX;*
>>
>> Line by line:
>> *INSERT INTO xxx (...** USING TTL xxx;*
>>
>> Is there a overhead using line by line option or wasted disk space?
>>
>> Thanks in advance.
>>
>>
>>
>>
>>
>>

Mime
View raw message