cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Turner <synfina...@gmail.com>
Subject optimizing use of sstableloader / SSTableSimpleUnsortedWriter
Date Sat, 25 Aug 2012 00:56:07 GMT
So I've read: http://www.datastax.com/dev/blog/bulk-loading

Are there any tips for using sstableloader /
SSTableSimpleUnsortedWriter to migrate time series data from a our old
datastore (PostgreSQL) to Cassandra?  After thinking about how
sstables are done on disk, it seems best (required??) to write out
each row at once.  Ie: if each row == 1 years worth of data and you
have say 30,000 rows, write one full row at a time (a full years worth
of data points for a given metric) rather then 1 data point for 30,000
rows.

Any other tips to improve load time or reduce the load on the cluster
or subsequent compaction activity?   All my CF's I'll be writing to
use compression and leveled compaction.

Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical (not sure yet how much we'll actually load
yet, but minimally 1 years worth).

Thanks!

-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Mime
View raw message