cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Turner <>
Subject Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter
Date Mon, 27 Aug 2012 18:10:13 GMT
On Mon, Aug 27, 2012 at 1:19 AM, aaron morton <> wrote:
> After thinking about how
> sstables are done on disk, it seems best (required??) to write out
> each row at once.
> Sort of. We only want one instance of the row per SSTable created.

Ah, good clarification, although I think for my purposes they're one
in the same.

> Any other tips to improve load time or reduce the load on the cluster
> or subsequent compaction activity?
> Less SSTables means less compaction. So go as high as you can on the
> bufferSizeInMB param for the
> SSTableSimpleUnsortedWriter.


> There is also a SSTableSimpleWriter. Because it expects rows to be ordered
> it does not buffer and can create bigger sstables.

Hmmm.... prolly not realistic in my situation... doing so would likely
thrash the disks on my PG server a lot more and kill my read
throughput and that server is already hitting a wall.

> Right now my Cassandra data store has about 4 months of data and we
> have 5 years of historical
> ingest all the histories!

Actually, I was a little worried about how much space that would
take... my estimates was ~305GB/year, which is a lot when you consider
the 300-400GB/node limit (something I didn't know about at the time).
However, compression has turned out to be extremely efficient on my
dataset... just under 4 months of data is less then 2GB!  I'm pretty

Aaron Turner         Twitter: @synfinatic - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

View raw message