Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: Blob vs. "normal" columns (internals) difference?
From: aaron morton <aaron@thelastpickle.com>
In-Reply-To: 
 <CALQqw3g=Wn5o2AcyqBjko=syGG3htmOOm-CX4z1w+_9OuBRnOA@mail.gmail.com>
Date: Thu, 4 Apr 2013 06:58:11 +0530
Content-Transfer-Encoding: quoted-printable
Message-Id: <649E3664-81A9-4669-81E7-C968F13E126F@thelastpickle.com>
References: 
 <CALQqw3g=Wn5o2AcyqBjko=syGG3htmOOm-CX4z1w+_9OuBRnOA@mail.gmail.com>
To: user@cassandra.apache.org

> 1. Is size getting bigger in either one in storing one Tweet?
If you store the data in one blob then we only store one column name and =
the blob. If they are in different cols then we store the column names =
and their values.

> 2. Has either choice have impact on read/write performance on large =
scale?
If you store data in a blob you can only read and update it as a blob, =
so chances are you will be wasting effort as you do read-modify-write =
operations. Unless you have a good reason split things up and store them =
as columns.=20

cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 3/04/2013, at 1:08 PM, Alan Risti=C4=87 <alan.ristic@gmail.com> =
wrote:

> Hi guys,
>=20
> Here is example (fictional) model I have for learning purposes...
>=20
> I'm currently storing the "User" object in a Tweet as blob value. So =
taking JSON of 'User' and storing it as blob. I'm wondering why is this =
better vs. just prefixing and flattening column names?
>=20
> Tweet {
>  id uuid,
>  user blob
> }
>=20
> vs.
>=20
> Tweet {
>  id uuid,
>  user_id uuid,
>  user_name text,
>  ....
> }
>=20
> In one or other
>=20
> 1. Is size getting bigger in either one in storing one Tweet?
> 2. Has either choice have impact on read/write performance on large =
scale?
> 3. Anything else I should be considering here? Your view/thinking =
would be great.
>=20
> Here is my understanding:
> For 'ease' of update if for example user changes its name I'm aware I =
need to (re)write whole object in all Tweets in first "blob" example and =
only user_name column in second 'flattened' example. Which brings me =
that If I'd wanted to actually do this "updating/rewriting" for every =
Tweet I'd use second 'flattened' example since payload of only user_name =
is smaller than whole User blob for every Tweet right?
>=20
> Nothing urgent, any input is valuable, tnx guys :)
>=20
>=20
>=20
> Hvala in lp,
> Alan Risti=C4=87
>=20
> w: personal blog             =20
>  t: @alanristic
>  l: linkedin.com/alanristic
> m: =E2=80=8B068 15 73 88=E2=80=8B