Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of cryptcom@gmail.com designates
 209.85.210.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=bGr350gfHXsZDQPE//2JbAlbWDxdjgm4cUk3dSpUL0l34d567xga+5onSBw541afyh
         tiXIJpOV/fM197j1ikrs+ZbDB0a7Y9D9v2cUu+i6hx50nB7AQWzpBNE2Tr+rt/kZd088
         eKtHaxZaUFwPtJ9xNhDYFyjxmz3BF8Y4cFWMo=
MIME-Version: 1.0
Date: Sun, 5 Jun 2011 14:43:38 -0400
Message-ID: <BANLkTimLCgSYOBG_RbcuDLU_95Lh1rkYGg@mail.gmail.com>
Subject: Paging Columns from a Row
From: Joseph Stein <cryptcom@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=20cf3043493077386e04a4fb5dde

--20cf3043493077386e04a4fb5dde
Content-Type: text/plain; charset=ISO-8859-1

What is the best practices here to page and slice columns from a row.

So lets say I have 1,000,000 columns in a row

I read the row but want to have 1 thread read columns 0 - 9999, second
thread (actor in my case) 10000 - 19999 ... and so on so i can have 100
workers processing 10,000 columns for each of my rows.

If there is no API for this then is it something I should a composite key on
and have to populate the rows with a counter

0000000:myoriginalcolumnnameX
0000001:myoriginalcolumnnameY
0000002:myoriginalcolumnnameZ

Going the composite key route and doing a start/end predicate would work but
then it kind of makes the insertion/load of this have to go through a
single synchronized point to generate the columns names... I am not opposed
to this but would prefer both the load of my data and processing of my data
to not be bound by any 1 single lock (even if distributed).

Thanks!!!!

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop
*/

--20cf3043493077386e04a4fb5dde
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

What is the best practices here to page and slice columns from a row.<div><=
br></div><div>So lets say I have 1,000,000 columns in a row</div><div><br><=
/div><div>I read the row but want to have 1 thread read columns 0 - 9999, s=
econd thread (actor in my case) 10000 - 19999 ... and so on so i can have 1=
00 workers processing 10,000 columns for each of my rows.</div>
<div><br></div><div>If there is no API for this then is it something I shou=
ld a composite key on and have to populate the rows with a counter</div><di=
v><br></div><div>0000000:myoriginalcolumnnameX</div><div><meta charset=3D"u=
tf-8"><div>
0000001:myoriginalcolumnnameY</div><meta charset=3D"utf-8"></div><div><meta=
 charset=3D"utf-8"><div><div>0000002:myoriginalcolumnnameZ</div></div></div=
><div><br></div><div>Going the composite key route and doing a start/end pr=
edicate would work but then it kind of makes the insertion/load of this hav=
e to go through a single=A0synchronized=A0point to generate the columns nam=
es... I am not opposed to this but would prefer both the load of my data an=
d processing of my data to not be bound by any 1 single lock (even if distr=
ibuted).</div>
<div><br></div><div>Thanks!!!!</div><div><br>/*<br>Joe Stein<br><a href=3D"=
http://www.linkedin.com/in/charmalloc">http://www.linkedin.com/in/charmallo=
c</a><br>Twitter: @allthingshadoop<br>*/<br>
</div>

--20cf3043493077386e04a4fb5dde--