cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter
Date Wed, 29 Aug 2012 04:46:29 GMT
> dataset... just under 4 months of data is less then 2GB!  I'm pretty
> thrilled.
Be thrilled by all the compressions ! :)

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/08/2012, at 6:10 AM, Aaron Turner <synfinatic@gmail.com> wrote:

> On Mon, Aug 27, 2012 at 1:19 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> After thinking about how
>> sstables are done on disk, it seems best (required??) to write out
>> each row at once.
>> 
>> Sort of. We only want one instance of the row per SSTable created.
> 
> Ah, good clarification, although I think for my purposes they're one
> in the same.
> 
> 
>> Any other tips to improve load time or reduce the load on the cluster
>> or subsequent compaction activity?
>> 
>> Less SSTables means less compaction. So go as high as you can on the
>> bufferSizeInMB param for the
>> SSTableSimpleUnsortedWriter.
> 
> Ok.
> 
>> There is also a SSTableSimpleWriter. Because it expects rows to be ordered
>> it does not buffer and can create bigger sstables.
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java
> 
> Hmmm.... prolly not realistic in my situation... doing so would likely
> thrash the disks on my PG server a lot more and kill my read
> throughput and that server is already hitting a wall.
> 
>> 
>> Right now my Cassandra data store has about 4 months of data and we
>> have 5 years of historical
>> 
>> ingest all the histories!
> 
> Actually, I was a little worried about how much space that would
> take... my estimates was ~305GB/year, which is a lot when you consider
> the 300-400GB/node limit (something I didn't know about at the time).
> However, compression has turned out to be extremely efficient on my
> dataset... just under 4 months of data is less then 2GB!  I'm pretty
> thrilled.
> 
> 
> -- 
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>    -- Benjamin Franklin
> "carpe diem quam minimum credula postero"


Mime
View raw message