Your use case is 100% on the money for Cassandra. But let me take a
chance to slam the other NoSQLs. (not really slam but you know)
Riak is a key-value store. It is not a column family store where a
rowkey has a map of sorted values. This makes the time series more
awkward as the time series has to span many rows, rather then one
HBase has similiar problems with time-series. On one hand if your
rowkeys are series you get hotspots, if you columns are time series
you run into two subtle issues. Last I check hbase's on disk format
repeats the key each time (somewhat wasteful)
Also there are issues with really big rows, although they are dealt
with in a similiar way to really wide rows in cassandra, just use time
as part of the row key and the rows will not get that large.
I do not think you need leveled compaction for an append only
workload, although it might be helpful depending on how long you want
to keep these rows. If you are not keeping them very long possibly
leveled would keep the on disk size smaller.
Column TTLs in cassandra do not require extra storage. It is a very
efficient way to do this. Otherwise you have to scan through your data
with some offline process and delete.
Do not worry about gc_grace to much. The moral is because of
distributed deletes some data lives on disk for a while after it is
deleted. All this means is you need "some" more storage then just the
space for your live data.
Don't use row cache with wide rows REPEAT Don't use row cache with wide rows
Compaction throughput is metered on each node (again not a setting to
if you are hitting flush_largest_memtables_at and
reduce_cache_capacity_to it basically means your have over tuned or
you do not have enough hardware. These are mostly emergency valves and
if you are setup well these are not a factor. They are only around to
relieve memory pressure to prevent the node from hitting a cycle where
it is in GC more then it is in serving mode.
Anyway. Nice to see that you are trying to understand the knobs,
before kicking the tires.
On Tue, Feb 12, 2013 at 5:55 AM, Boris Solovyov
> Hello list!
> I have application with following characteristics:
> data is time series, tens of millions of series at 1-sec granularity, like
> stock ticker data
> values are timestamp, integer (uint64)
> data is append only, never update
> data don't write in distant past, maybe sometimes write 10 sec ago but not
> data is write mostly, like 99.9% write I think
> most read will be of recent data, always in range of timestamps
> data needs purge after some time, ex. 1 week
> I consider to use Cassandra. No other existing database (HBase, Riak, etc)
> seems well suited for this.
> Did I miss some others database that could work? Please suggest me if you
> know one.
> What are benefits or drawbacks of leveled compaction for this workload?
> Setting column TTL seems bad choice due to extra storage. Agree? Is
> efficient to run routine batch job to purge oldest data? Is there will be
> any gotcha with that (like fullscan of something instead of just oldest,
> Will column index beneficial? If reads are scans, does it matter, or is it
> just extra work and storage space to maintain, without much benefit
> especially since reads are rare?
> How gc_grace_seconds impacts operations in this workload? Will purges of old
> data leave sstables mostly obsolete, rather than sparsely obsolete? I think
> they will. So, after purge, tombstones can be GC shortly, no need for
> default 10 days grace period. BUT, I read in docs that if gc_grace_seconds
> is short, then nodetool repair needs run quite often. Is that true? Why
> would that be needed in my use case?
> Related question: is it sensible to set tombstone_threshold to 1.0 but
> tombstone_compaction_interval to something short, like 1 hour? I suppose
> this depends on whether I am correct that SSTables will be deleted entirely,
> instead of just getting sparse.
> Should I disable row_cache_provider? It invalidates every row on update,
> right? I will be updating rows constantly, so it seems not benefitial.
> Docs say "compaction_throughput_mb_per_sec" is per "entire system." Does
> that mean per NODE, or per ENTIRE CLUSTER? Will this cause trouble with
> periodic deletions of expired columns? Do I need to make sure my purges of
> old data are trickled out over time to avoid huge overhead of compaction?
> But in that case, SSTables will become sparsely deleted, right? And then
> re-compacted, which seems wasteful if the remaining data will soon be purged
> again and there will be another re-compaction. So this is partially why I
> asked about tombstone-threshold and compaction interval -- I think is best
> if I can purge data in such a way that Cassandra never recompacts SsTables,
> but just realizes "oh, whole thing is dead, I can delete, no work needed."
> But I am not sure if my considered settings will have unintended
> Finally, with proposed workload, will there be troubles with
> flush_larges_memtables_at and reduce_cache_capacity_to,
> reduce_cache_sizes_at? These are describe as "emergency measures" in docs.
> If my workload is edge case that could trigger bad emergency-measure
> behavior I hope you can say me that :-)
> Many thanks!