cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Mac <ugs...@hotmail.com>
Subject RE: Advice wanted on modeling
Date Thu, 13 Jan 2011 07:18:15 GMT

> Date: Thu, 13 Jan 2011 01:29:33 +0100
> Subject: Re: Advice wanted on modeling
> From: peter.schuller@infidyne.com
> To: user@cassandra.apache.org
> 
> > The application will have a large number of records, with the records
> > consisting of a fixed part and a number (n) of periodic parts.
> > * The fixed part is updated occasionally.
> > * The periodic parts are never updated, but a new one is added every 5 to 10
> > minutes. Only the last n periodic parts need to be kept, so that the oldest
> > one can be deleted after adding a new part.
> > * The records will always be read completely (meaning fixed part and all
> > periodic parts). Reads are less frequent than writes.
> > The application will be running continuosly, at least for a few weeks, so
> > there will be many, many stale periodic parts, so I'm a bit worried about
> > data comsumption and compactions.
> 
> I was going to hit send on a partial recommendation but realized I
> don't really have enough information given that you seem to be making
> pretty specific optimizations.
> 
> You say writes are more frequent than reads. To what extent - are
> reads *very* infrequent to the point that the performance of the reads
> are almost completely irrelevant?

What exactly is a write? Is a record update or is it a batch of record updates
that is executed in one operation? In my case I'm batching about a thousand
record updates (new periodic parts) into a single batch_mutate. A read would
constitute fetching all parts of a single record. In the text below I'm using the
term update to mean a record update.

I expect about a few reads typically for every thousand updates (<1%), although
read pressure will vary considerably over time. I don't expect more than a hundred
reads for every thousand updates (about 10%). Read performance is not irrelevant,
but definitely subordinate to write performance, which is crucial (and one of the
reasons I selected Cassandra).

> You seem worried about tombstones and data size. Is the issue that
> you're expecting huge amounts of data and disk space/compaction
> frequency is an issue?

Yes, I am expecting huge amounts of data and without compaction I would
soon (few days to a week) run out of disk space.

> Are you expecting write load to be high such that performance of
> writes (and compaction) is a concern, or is it mostly about slowly
> building up huge amounts of data that you want to be compact on disk?

I'm not sure here. My write load is high, estimated at a thousand records
per second (batched, of course).
 		 	   		  
Mime
View raw message