Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9153AF996 for ; Thu, 4 Apr 2013 01:28:42 +0000 (UTC) Received: (qmail 63631 invoked by uid 500); 4 Apr 2013 01:28:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 63599 invoked by uid 500); 4 Apr 2013 01:28:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 63591 invoked by uid 99); 4 Apr 2013 01:28:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 01:28:40 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a83.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 01:28:35 +0000 Received: from homiemail-a83.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTP id 8D4395E06A for ; Wed, 3 Apr 2013 18:28:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=7Xs7qOukYSvJRbY6xN1Qt8gaKHo=; b=qK10f4yj2Z NWZViQnKewSjUHdguhMKYepxhEgxqLiDw1LTB2SdXSahW95lEZvw+IRNkRFgwtqE oXj814+fOA7ipwM7H4Q+jakp25BUMZTVST9LStJoWPxI3XSNBxEkIf7ZjfP/IF05 zRJ0mMRNH0QbAoGiBTlTqM6pLWP9WrMe8= Received: from [172.20.2.191] (unknown [115.112.62.228]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTPSA id DFDBF5E063 for ; Wed, 3 Apr 2013 18:28:14 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Blob vs. "normal" columns (internals) difference? From: aaron morton In-Reply-To: Date: Thu, 4 Apr 2013 06:58:11 +0530 Content-Transfer-Encoding: quoted-printable Message-Id: <649E3664-81A9-4669-81E7-C968F13E126F@thelastpickle.com> References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org > 1. Is size getting bigger in either one in storing one Tweet? If you store the data in one blob then we only store one column name and = the blob. If they are in different cols then we store the column names = and their values. > 2. Has either choice have impact on read/write performance on large = scale? If you store data in a blob you can only read and update it as a blob, = so chances are you will be wasting effort as you do read-modify-write = operations. Unless you have a good reason split things up and store them = as columns.=20 cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 3/04/2013, at 1:08 PM, Alan Risti=C4=87 = wrote: > Hi guys, >=20 > Here is example (fictional) model I have for learning purposes... >=20 > I'm currently storing the "User" object in a Tweet as blob value. So = taking JSON of 'User' and storing it as blob. I'm wondering why is this = better vs. just prefixing and flattening column names? >=20 > Tweet { > id uuid, > user blob > } >=20 > vs. >=20 > Tweet { > id uuid, > user_id uuid, > user_name text, > .... > } >=20 > In one or other >=20 > 1. Is size getting bigger in either one in storing one Tweet? > 2. Has either choice have impact on read/write performance on large = scale? > 3. Anything else I should be considering here? Your view/thinking = would be great. >=20 > Here is my understanding: > For 'ease' of update if for example user changes its name I'm aware I = need to (re)write whole object in all Tweets in first "blob" example and = only user_name column in second 'flattened' example. Which brings me = that If I'd wanted to actually do this "updating/rewriting" for every = Tweet I'd use second 'flattened' example since payload of only user_name = is smaller than whole User blob for every Tweet right? >=20 > Nothing urgent, any input is valuable, tnx guys :) >=20 >=20 >=20 > Hvala in lp, > Alan Risti=C4=87 >=20 > w: personal blog =20 > t: @alanristic > l: linkedin.com/alanristic > m: =E2=80=8B068 15 73 88=E2=80=8B