cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@yakaz.com>
Subject Re: Cassandra and TTL
Date Tue, 12 Jan 2010 20:39:05 GMT
> I'm skeptical that this is a common use-case...

Fair enough. The idea is a sort of long lived cache, as I want to store (big
volume of) crawled document. But I can't (and don't want) to keep those
documents forever, hence this TTL idea. Agreed this is a rather specific
use-case, but I can imagine other kind of data that you may not want to keep
forever (news articles, old data (say a tweet) that hasn't been use in a long
time, ...). Having to track such "outdated" data manually is just no fun. But
maybe it's not that common...

> If truncating old sstables entirely
> (https://issues.apache.org/jira/browse/CASSANDRA-531) meets your
> needs, that is going to be less work and more performant.

Well, I'm not sure I understand completely this ticket. The part in the
comment saying "drop all sstables older than X" seems to be something helpful.
But aren't sstables regularly merged together, thus mixing "older" data with
newer data ?
That is, is this 'truncate with a timestamp t' always remove *all* columns
with a timestamp older than t ?

Thanks
--
Sylvain


> On Tue, Jan 12, 2010 at 10:45 AM, Sylvain Lebresne <sylvain@yakaz.com> wrote:
>> Hello,
>>
>> I have to deal with a lot of different data and Cassandra seems to be a good
>> fit for my needs so far. However, some of this data is volatile by nature and
>> for those, I would need to set something akin to a TTL. Those TTL could be
>> long, but keeping those data forever would be useless.
>>
>> I could deal with that by hand, writing some daemon that run regularly and
>> remove what should be removed. However this is not particularly efficient, nor
>> convenient, and I would find it really cool to be able to provide a TTL when
>> inserting something and don't have to care more than that.
>>
>> Which leads me to my question: why Cassandra doesn't allow to set a TTL for
>> data ? Is it for technical reason ? For philosophical reason ? Or just nobody
>> had needed it sufficiently to write it ?
>>
>> From what I understand of how Cassandra works, it seems to me that it
>> could be done pretty efficiently (even though I agree that it wouldn't
>> be a minor
>> change). That is, it would require to add a ttl to column (and/or row). When
>> reading a column whose timestamp + ttl is expired, it would ignore it (as for
>> tombstoned column). Then during compaction, expired column would be
>> collected.
>>
>> Is there any major difficulties/obstacles I don't see ?
>> Or maybe is there some trick I don't know about that allow to do such a thing
>> already ?
>>
>> And if not, would that be something that would interest the Cassandra
>> community ? Or does nobody ever need such a thing ? (I personally believe it
>> to be a desirable feature, but maybe I am the only one.)
>>
>> Thanks,
>> Sylvain
>>
>

Mime
View raw message