From user-return-20474-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Sat Sep 3 23:01:29 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC13D89E1 for ; Sat, 3 Sep 2011 23:01:29 +0000 (UTC) Received: (qmail 19282 invoked by uid 500); 3 Sep 2011 23:01:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 19238 invoked by uid 500); 3 Sep 2011 23:01:26 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 19228 invoked by uid 99); 3 Sep 2011 23:01:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Sep 2011 23:01:26 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of burtonator2011@gmail.com designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Sep 2011 23:01:19 +0000 Received: by wwf5 with SMTP id 5so3304230wwf.25 for ; Sat, 03 Sep 2011 16:00:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:from:date:x-google-sender-auth:message-id :subject:to:content-type; bh=gATPgT6MqVkAw52Y/2mLsN8u3pLhoqiA/2ArLxGRCx4=; b=XxSF7w669lJUBKGLAH3IUXZeoEbUgilBmPQgpcmdURnOUJW7MTyByzd67vj/LL7az8 EfO6DTyYHZBE5eyfzFqaP7YvKaVc21Ni0OpqBQ65X0rGNe9ef60/PSrxpdIH9w3Biuhr KRK6w5tOojSdE3g1Zc7T87kMxJOOhkIV1poVw= Received: by 10.216.145.1 with SMTP id o1mr2267927wej.68.1315090859097; Sat, 03 Sep 2011 16:00:59 -0700 (PDT) MIME-Version: 1.0 Sender: burtonator2011@gmail.com Received: by 10.216.25.66 with HTTP; Sat, 3 Sep 2011 16:00:39 -0700 (PDT) From: Kevin Burton Date: Sat, 3 Sep 2011 16:00:39 -0700 X-Google-Sender-Auth: qZnYnxTJkAjnz6p3_LcLOXw9ZS4 Message-ID: Subject: Not all data structures need timestamps (and don't require wasted memory). To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6d6255080fe2a04ac11731e X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d6255080fe2a04ac11731e Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I was thinking more about the excessive (IMO) use of memory in Cassandra du= e to 8 bytes per column/row (cell) in Cassandra. Any operation that is idempotent does not require a timestamp. For example, set membership. A link adjacency list is a good example. If you have a list of source->targets, adding a new member to 'targets' shouldn't require another timestamp because multiple additions end up with the same result (it is idempotent.) This can be modeled by just adding another column. The results of ETL jobs that are being bulk loaded back into Cassandra don'= t require timestamps. You could create a long running ZK lock to represent each load to prevent multiple writers per key. In these scenarios, timestamps are just a waste of memory. It's a significant one as well. For our usage it will require 3-4x more memory to deploy Cassandra=85 I'm not really jumping at the bit to pay an extra $120-150k per month in hosting costs=85 though I'm sure my hosting provider would love it :) Kevin --=20 Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* Skype-in: *(415) 871-0687* --0016e6d6255080fe2a04ac11731e Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I was thinking more about the excessive (IMO) use of memory in Cassandra du= e to 8 bytes per column/row (cell) in Cassandra.

Any ope= ration that is idempotent does not require a timestamp. =A0

For example, set membership.

A link adj= acency list is a good example.

If you have a list = of source->targets, adding a new member to 'targets' shouldn'= ;t require another timestamp because multiple additions end up with the sam= e result (it is idempotent.)

This can be modeled by just adding another column.

The results of ETL jobs that are being bulk loaded bac= k into Cassandra don't require timestamps. =A0You could create a long r= unning ZK lock to represent each load to prevent multiple writers per key.<= /div>

In these scenarios, timestamps are just a waste of memo= ry. =A0It's a significant one as well. For our usage it will require 3-= 4x more memory to deploy Cassandra=85 I'm not really jumping at the bit= to pay an extra $120-150k per month in hosting costs=85 though I'm sur= e my hosting provider would love it :)

Kevin

--
Founder/CEO=A0Spinn3r.c= om

Location:=A0San Francisco, CA
Skype:=A0burtonator=

Skype-in:=A0(415) 871-068= 7


--0016e6d6255080fe2a04ac11731e--