Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 36D14D081 for ; Wed, 29 Aug 2012 04:47:07 +0000 (UTC) Received: (qmail 36451 invoked by uid 500); 29 Aug 2012 04:47:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 36403 invoked by uid 500); 29 Aug 2012 04:47:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 36386 invoked by uid 99); 29 Aug 2012 04:47:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 04:47:04 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a51.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Aug 2012 04:46:59 +0000 Received: from homiemail-a51.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a51.g.dreamhost.com (Postfix) with ESMTP id 9FDDB2E8076 for ; Tue, 28 Aug 2012 21:46:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=tNoSDUs+PPsALmzpIc6VuzelJA Q=; b=m+sl8UndbMJmaqjcuSHBui7Tz1VoGQEOC81zYuvJzwEKIQQLFNt83Rw2L4 P+G+3fe46o+bsWJZ7wTGpeSU74ZoWgkA6eflwq/kBwL2cCcADlkIxYchDD7sPZom XVuscmEmNGCizHe13Df+DSiJrnls89lNfaw4gKCm6aWzzqRRo= Received: from [10.8.0.150] (unknown [72.28.97.147]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a51.g.dreamhost.com (Postfix) with ESMTPSA id 179FD2E806A for ; Tue, 28 Aug 2012 21:46:37 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_CE6433DA-6E8F-47BC-AABE-F4458C2EF32C" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\)) Subject: Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter Date: Wed, 29 Aug 2012 16:46:29 +1200 References: <2653950A-E313-4176-8561-5926A34F6D46@thelastpickle.com> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1486) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_CE6433DA-6E8F-47BC-AABE-F4458C2EF32C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > dataset... just under 4 months of data is less then 2GB! I'm pretty > thrilled. Be thrilled by all the compressions ! :) Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/08/2012, at 6:10 AM, Aaron Turner wrote: > On Mon, Aug 27, 2012 at 1:19 AM, aaron morton = wrote: >> After thinking about how >> sstables are done on disk, it seems best (required??) to write out >> each row at once. >>=20 >> Sort of. We only want one instance of the row per SSTable created. >=20 > Ah, good clarification, although I think for my purposes they're one > in the same. >=20 >=20 >> Any other tips to improve load time or reduce the load on the cluster >> or subsequent compaction activity? >>=20 >> Less SSTables means less compaction. So go as high as you can on the >> bufferSizeInMB param for the >> SSTableSimpleUnsortedWriter. >=20 > Ok. >=20 >> There is also a SSTableSimpleWriter. Because it expects rows to be = ordered >> it does not buffer and can create bigger sstables. >> = https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassand= ra/io/sstable/SSTableSimpleWriter.java >=20 > Hmmm.... prolly not realistic in my situation... doing so would likely > thrash the disks on my PG server a lot more and kill my read > throughput and that server is already hitting a wall. >=20 >>=20 >> Right now my Cassandra data store has about 4 months of data and we >> have 5 years of historical >>=20 >> ingest all the histories! >=20 > Actually, I was a little worried about how much space that would > take... my estimates was ~305GB/year, which is a lot when you consider > the 300-400GB/node limit (something I didn't know about at the time). > However, compression has turned out to be extremely efficient on my > dataset... just under 4 months of data is less then 2GB! I'm pretty > thrilled. >=20 >=20 > --=20 > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix = & Windows > Those who would give up essential Liberty, to purchase a little = temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero" --Apple-Mail=_CE6433DA-6E8F-47BC-AABE-F4458C2EF32C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 dataset... just under 4 months of data is = less then 2GB!  I'm pretty
thrilled.
Be thrilled = by all the compressions ! = :)

Cheers

http://www.thelastpickle.com

On 28/08/2012, at 6:10 AM, Aaron Turner <synfinatic@gmail.com> = wrote:

On Mon, Aug 27, 2012 at 1:19 AM, aaron morton <aaron@thelastpickle.com> = wrote:
After thinking about how
sstables = are done on disk, it seems best (required??) to write out
each row at = once.

Sort of. We only want one instance of the row per SSTable = created.

Ah, good clarification, although I think = for my purposes they're one
in the same.


Any other tips to improve load time or reduce the load on = the cluster
or subsequent compaction activity?

Less SSTables = means less compaction. So go as high as you can on the
bufferSizeInMB = param for = the
SSTableSimpleUnsortedWriter.

Ok.

There is also a SSTableSimpleWriter. Because it = expects rows to be ordered
it does not buffer and can create bigger = sstables.
https://github.com/apache/= cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpl= eWriter.java

Hmmm.... prolly not realistic in my = situation... doing so would likely
thrash the disks on my PG server a = lot more and kill my read
throughput and that server is already = hitting a wall.


Right now my = Cassandra data store has about 4 months of data and we
have 5 years = of historical

ingest all the = histories!

Actually, I was a little worried about = how much space that would
take... my estimates was ~305GB/year, which = is a lot when you consider
the 300-400GB/node limit (something I = didn't know about at the time).
However, compression has turned out = to be extremely efficient on my
dataset... just under 4 months of = data is less then 2GB!  I'm pretty
thrilled.


-- =
Aaron Turner
http://synfin.net/ =         Twitter: = @synfinatic
http://tcpreplay.synfin.net/ - = Pcap editing and replay tools for Unix & Windows
Those who would = give up essential Liberty, to purchase a little temporary
Safety, = deserve neither Liberty nor Safety.
   -- Benjamin = Franklin
"carpe diem quam minimum credula = postero"

= --Apple-Mail=_CE6433DA-6E8F-47BC-AABE-F4458C2EF32C--