Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08DBF11F1B for ; Sun, 11 May 2014 08:36:15 +0000 (UTC) Received: (qmail 99847 invoked by uid 500); 10 May 2014 23:03:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28108 invoked by uid 500); 10 May 2014 22:58:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 84281 invoked by uid 99); 10 May 2014 22:55:21 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 May 2014 22:55:21 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of clohfink@blackbirdit.com designates 209.85.223.182 as permitted sender) Received: from [209.85.223.182] (HELO mail-ie0-f182.google.com) (209.85.223.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 May 2014 20:20:28 +0000 Received: by mail-ie0-f182.google.com with SMTP id tp5so1602842ieb.13 for ; Wed, 07 May 2014 13:20:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=KnqE2/KcY5l9a0bW7sv4QB0wNZECk2zV08SmvHRML+k=; b=dei+xEt10hHKjYdZi6BZHucTbQYjidp7KK+Wkh1rbb9PM6lFoWeY6mf0cA4ggqtg5o Ya23OiLXKNDC6ubwcUXk1Allufj1bD+s66pe2twmiX136hWn/QgZyeuBw6YFFuIhC4bL cW4+zvfC3AM9cviuyJKU/iGQpQOaW7rS2cpdFy3EraWHODfo8v5W12R6zJzu/dqUPLmt 6AybTR2zmDNDkt34swmTuBoI7ieifIJT5alEHh5f6c5W1Dg9QcwV3vmLasSAOgfGqxkh eoG1UZ+3kjn3siLmV7vQAULa5As5vHXKMX4Y63+8r2WoKLYK/85gu6DzNAU5MdR4La9g Pffg== X-Gm-Message-State: ALoCoQnDDqUXXvJ4BGuD8Aae+Y3upLWk/tOQCLleUejmEmYSGNcCRrtMR/SYALfm56NcagP0MnSz X-Received: by 10.51.17.5 with SMTP id ga5mr47448560igd.2.1399494005420; Wed, 07 May 2014 13:20:05 -0700 (PDT) Received: from [10.10.10.100] (97-86-246-164.dhcp.roch.mn.charter.com. [97.86.246.164]) by mx.google.com with ESMTPSA id k8sm166014ige.0.2014.05.07.13.20.04 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 07 May 2014 13:20:04 -0700 (PDT) From: Chris Lohfink Content-Type: multipart/alternative; boundary="Apple-Mail=_708E800D-573D-4DCE-8CDC-6C6446488B41" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: Storing log structured data in Cassandra without compactions for performance boost. Date: Wed, 7 May 2014 15:20:02 -0500 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1874) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_708E800D-573D-4DCE-8CDC-6C6446488B41 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Whats your data model look like? > I think it would be best to just disable compactions. Why? are you never doing reads? There is also a cost to = repairs/bootstrapping when you have a ton of sstables. This might be a = premature optimization. If the data is read from a slice of a partition that has been added over = time there will be a part of that row in every almost sstable. That = would mean all of them (multiple disk seeks depending on clustering = order per sstable) would have to be read from in order to service the = query. Data model can help or hurt a lot though. If you set the TTL for the columns you added then C* will clean up = sstables (if size tiered and post 1.2) once the datas been expired. = Since you never delete set the gc_grace_seconds to 0 so the ttl = expiration doesnt result in tombstones. --- Chris Lohfink=20 On May 6, 2014, at 7:55 PM, Kevin Burton wrote: > I'm looking at storing log data in Cassandra=85=20 >=20 > Every record is a unique timestamp for the key, and then the log line = for the value. >=20 > I think it would be best to just disable compactions. >=20 > - there will never be any deletes. >=20 > - all the data will be accessed in time range (probably partitioned = randomly) and sequentially. >=20 > So every time a memtable flushes, we will just keep that SSTable = forever. =20 >=20 > Compacting the data is kind of redundant in this situation. >=20 > I was thinking the best strategy is to use setcompactionthreshold and = set the value VERY high to compactions are never triggered. >=20 > Also, It would be IDEAL to be able to tell cassandra to just drop a = full SSTable so that I can truncate older data without having to do a = major compaction and without having to mark everything with a tombstone. = Is this possible? >=20 >=20 >=20 > --=20 >=20 > Founder/CEO Spinn3r.com > Location: San Francisco, CA > Skype: burtonator > blog: http://burtonator.wordpress.com > =85 or check out my Google+ profile >=20 > War is peace. Freedom is slavery. Ignorance is strength. Corporations = are people. >=20 --Apple-Mail=_708E800D-573D-4DCE-8CDC-6C6446488B41 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252
Whats your data model look = like?

I think it = would be best to just disable = compactions.

Why? are you never doing = reads?  There is also a cost to repairs/bootstrapping when you have = a ton of sstables.  This might be a premature = optimization.

If the data is read from a slice of a = partition that has been added over time there will be a part of that row = in every almost sstable. That would mean all of them (multiple disk = seeks depending on clustering order per sstable) would have to be read = from in order to service the query.  Data model can help or hurt a = lot though.

If you set the TTL for the columns = you added then C* will clean up sstables (if size tiered and post 1.2) = once the datas been expired.  Since you never delete set the = gc_grace_seconds to 0 so the ttl expiration doesnt result in = tombstones.

---
Chris = Lohfink 



On May 6, = 2014, at 7:55 PM, Kevin Burton <burton@spinn3r.com> = wrote:

I'm looking at storing log data in = Cassandra=85 

Every record is a unique timestamp = for the key, and then the log line for the = value.

I think it would be best to just disable = compactions.

- there will never be any = deletes.

- all the data will be accessed in = time range (probably partitioned randomly) and = sequentially.

So every time a memtable flushes, = we will just keep that SSTable forever.  

Compacting the data is kind of redundant in this = situation.

I was thinking the best strategy is = to use setcompactionthreshold and set the value VERY high to compactions = are never triggered.

Also, It would be IDEAL to be able to tell cassandra = to just drop a full SSTable so that I can truncate older data without = having to do a major compaction and without having to mark everything = with a tombstone.  Is this possible?



--

Founder/CEO Spinn3r.com
Location: San Francisco, = CA
Skype: burtonator
=85 or check out my Google+ profile
War is peace. Freedom is slavery. Ignorance = is strength. Corporations are people.


= --Apple-Mail=_708E800D-573D-4DCE-8CDC-6C6446488B41--