incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Solovyov <>
Subject Re: Seeking suggestions for a use case
Date Tue, 12 Feb 2013 20:08:21 GMT
Thanks. So in your use case, you actually keep parts of the same series in
different rows, to keep the rows from getting too wide? I thought Cassandra
worked OK with millions of columns per row. If I don't have to split a row
into parts, that keep the data model simpler for me. (Otherwise, if I want
to split row and reassemble in client code, I could just use RDBMS :-)

On Tue, Feb 12, 2013 at 12:07 PM, Hiller, Dean <> wrote:

> We are using cassandra for time series as well with PlayOrm.  A guess is
> we will be doing equal reads and writes on all the data going back 10
> years(currently in production we are write heavy right now).  We have
> 60,000 virtual tables (one table per sensor we read from and yes we have
> that many sensors).  We partition with PlayOrm partitioning one months
> worth for each of the virtual tables.  This gives us a wide row index into
> each partition that playorm creates and the rest of the data varies
> between very narrow tables (one column) and tables with around 20 columns.
>  It seems to be working extremely well so far and we run it on 6 cassandra
> nodes as well.
> Anyways, thought I would share as perhaps it helps you understand your use
> case.
> Later,
> Dean
> On 2/12/13 8:08 AM, "Edward Capriolo" <> wrote:
> >Your use case is 100% on the money for Cassandra. But let me take a
> >chance to slam the other NoSQLs. (not really slam but you know)
> >
> >Riak is a key-value store. It is not a column family store where a
> >rowkey has a map of sorted values. This makes the time series more
> >awkward as the time series has to span many rows, rather then one
> >large row.
> >
> >HBase has similiar problems with time-series. On one hand if your
> >rowkeys are series you get hotspots, if you columns are time series
> >you run into two subtle issues. Last I check hbase's on disk format
> >repeats the key each time (somewhat wasteful)
> >
> >key,column,value
> >key,column,value
> >key,column,value
> >
> >Also there are issues with really big rows, although they are dealt
> >with in a similiar way to really wide rows in cassandra, just use time
> >as part of the row key and the rows will not get that large.
> >
> >I do not think you need leveled compaction for an append only
> >workload, although it might be helpful depending on how long you want
> >to keep these rows. If you are not keeping them very long possibly
> >leveled would keep the on disk size smaller.
> >
> >Column TTLs in cassandra do not require extra storage. It is a very
> >efficient way to do this. Otherwise you have to scan through your data
> >with some offline process and delete.
> >
> >Do not worry about gc_grace to much. The moral is because of
> >distributed deletes some data lives on disk for a while after it is
> >deleted. All this means is you need "some" more storage then just the
> >space for your live data.
> >
> >Don't use row cache with wide rows REPEAT Don't use row cache with wide
> >rows
> >
> >Compaction throughput is metered on each node (again not a setting to
> >worry about
> >
> >if you are hitting flush_largest_memtables_at and
> >reduce_cache_capacity_to it basically means your have over tuned or
> >you do not have enough hardware. These are mostly emergency valves and
> >if you are setup well these are not a factor. They are only around to
> >relieve memory pressure to prevent the node from hitting a cycle where
> >it is in GC more then it is in serving mode.
> >
> >Whew!
> >
> >Anyway. Nice to see that you are trying to understand the knobs,
> >before kicking the tires.
> >
> >On Tue, Feb 12, 2013 at 5:55 AM, Boris Solovyov
> ><> wrote:
> >> Hello list!
> >>
> >> I have application with following characteristics:
> >>
> >> data is time series, tens of millions of series at 1-sec granularity,
> >>like
> >> stock ticker data
> >> values are timestamp, integer (uint64)
> >> data is append only, never update
> >> data don't write in distant past, maybe sometimes write 10 sec ago but
> >>not
> >> more
> >> data is write mostly, like 99.9% write I think
> >> most read will be of recent data, always in range of timestamps
> >> data needs purge after some time, ex. 1 week
> >>
> >> I consider to use Cassandra. No other existing database (HBase, Riak,
> >>etc)
> >> seems well suited for this.
> >>
> >> Questions:
> >>
> >> Did I miss some others database that could work? Please suggest me if
> >>you
> >> know one.
> >> What are benefits or drawbacks of leveled compaction for this workload?
> >> Setting column TTL seems bad choice due to extra storage. Agree? Is
> >> efficient to run routine batch job to purge oldest data? Is there will
> >>be
> >> any gotcha with that (like fullscan of something instead of just oldest,
> >> maybe?)
> >> Will column index beneficial? If reads are scans, does it matter, or is
> >>it
> >> just extra work and storage space to maintain, without much benefit
> >> especially since reads are rare?
> >> How gc_grace_seconds impacts operations in this workload? Will purges
> >>of old
> >> data leave sstables mostly obsolete, rather than sparsely obsolete? I
> >>think
> >> they will. So, after purge, tombstones can be GC shortly, no need for
> >> default 10 days grace period. BUT, I read in docs that if
> >>gc_grace_seconds
> >> is short, then nodetool repair needs run quite often. Is that true? Why
> >> would that be needed in my use case?
> >> Related question: is it sensible to set tombstone_threshold to 1.0 but
> >> tombstone_compaction_interval to something short, like 1 hour? I suppose
> >> this depends on whether I am correct that SSTables will be deleted
> >>entirely,
> >> instead of just getting sparse.
> >> Should I disable row_cache_provider? It invalidates every row on update,
> >> right? I will be updating rows constantly, so it seems not benefitial.
> >> Docs say "compaction_throughput_mb_per_sec" is per "entire system." Does
> >> that mean per NODE, or per ENTIRE CLUSTER? Will this cause trouble with
> >> periodic deletions of expired columns? Do I need to make sure my purges
> >>of
> >> old data are trickled out over time to avoid huge overhead of
> >>compaction?
> >> But in that case, SSTables will become sparsely deleted, right? And then
> >> re-compacted, which seems wasteful if the remaining data will soon be
> >>purged
> >> again and there will be another re-compaction. So this is partially why
> >>I
> >> asked about tombstone-threshold and compaction interval -- I think is
> >>best
> >> if I can purge data in such a way that Cassandra never recompacts
> >>SsTables,
> >> but just realizes "oh, whole thing is dead, I can delete, no work
> >>needed."
> >> But I am not sure if my considered settings will have unintended
> >> consequence.
> >> Finally, with proposed workload, will there be troubles with
> >> flush_larges_memtables_at and reduce_cache_capacity_to,
> >> reduce_cache_sizes_at? These are describe as "emergency measures" in
> >>docs.
> >> If my workload is edge case that could trigger bad emergency-measure
> >> behavior I hope you can say me that :-)
> >>
> >> Many thanks!
> >>
> >> Boris

View raw message