Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of watcherfr@gmail.com designates
 209.85.212.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAENxBwwnpvJNXG-DE6vQiMv9w2_jyeH0vc3Cm-qt17b1HH6qvQ@mail.gmail.com>
References: 
 <CANGD+iqveg+g-GA_rP2LnHGH80M4Uy1sWeiDXNfPK8JF-wx=uw@mail.gmail.com>
	<CAENxBwyMc9FYNDhQR2QGQdHU_bNaVt5i8AQr7z2LW9JUQfm_tg@mail.gmail.com>
	<CANGD+iq0mpw16BKLipV8Hg2S89F0fJNwPrp8v7RkXSPQkBuzFA@mail.gmail.com>
	<CAENxBww-dqZRjMR8dWtZ+8KJakKp456OnuHWHvXxbaMxt58ykw@mail.gmail.com>
	<CAHwsXYnETLxKr0pDKBe6ae-K_TT9nSysWsV05ZuYB3XmncsTmQ@mail.gmail.com>
	<CAENxBwy0BQaa7AkbQdeQ+X_JKXcxz_yDXa6MjAL7GydbZw04=A@mail.gmail.com>
	<CANGD+iq1D07YLHV3TzHXGX-thfLAuR2RLr8fk4+EDfpBO+Xsxw@mail.gmail.com>
	<CAENxBwwnpvJNXG-DE6vQiMv9w2_jyeH0vc3Cm-qt17b1HH6qvQ@mail.gmail.com>
Date: Fri, 30 Dec 2011 10:44:13 +0100
Message-ID: 
 <CAHwsXYnDD4jXZwPH03-ng9k-Jw0NuSHNBvWLND7RzzeHVV_Bfg@mail.gmail.com>
Subject: Re: Retrieve all composite columns from a row, whose composite name's
 first component matches from a list of Integers
From: Philippe <watcherfr@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=e89a8f3bad475cb2d704b54c13cd

--e89a8f3bad475cb2d704b54c13cd
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I currently have
scf[c1][sc1]=3Dvalue
scf[c1][sc2]=3Dvalue
...
scf[c2][sc1]=3Dvalue
scf[c2][sc2]=3Dvalue
scf[c2][sc3]=3Dvalue
scf[c2][sc4]=3Dvalue

99% of the time, I do multiget super slices: for multiple keys, I query for
columns explicitly c1,c2,c10,c12
1% of the time, I do a multigetrange superslice where for multiple keys, I
query for a range of super columns
As Tyler said, it can be done by specifying supercolumns in the slice
predicate, it will implicitly return all its columns. I use Hector and it
works great.

Now interestingly enough, column names sc1, sc2, sc3 are in fact home-made
composite columns.

I could and would switch to full composite columns because I am fishing for
every drop of performance I can. However, I would need "Letting
multiget_slice accept multiple SlicePredicates per key could also
accomplish this."
Can anyone on the dev team comment on doing this ? Is it a no-no ?

Thanks

2011/12/29 Edward Capriolo <edlinuxguru@gmail.com>

> Hum...
>
> Do you have this?
> scf [b][1][a]=3Dvalue
> scf [b][1][x]=3Dvalue
> scf [b][7][b]=3Dvalue
>
> and you want to slice:
> scf [b][1][*]
>
> Which would result in
>
> scf [b][1][a]=3Dvalue
> scf [b][1][x]=3Dvalue
>
> ?
>
> The composite version of this would be:
> cf [b][1:a]=3Dvalue
> cf [b][1:x]=3Dvalue
> cf [b][7:b]=3Dvalue
>
> I am not sure exactly what you are doing because A SlicePredicate
> takes either a list of columns or a SliceRange. A ColumnPath takes a
> Single SuperColumn.
>
> I do not see how this is done with Columns or SuperColumns. Maybe you
> can provide a code snippet and/or some sample data?
>
> On 12/29/11, Aditya <adynnn@gmail.com> wrote:
> > @Edward: Perhaps you missed to notice that I need to always retrieve 'a=
ll
> > columns' under the supercolumn at any time.. and as per my query
> > requirements if I use composite columns instead of supercolumns then it
> is
> > impossible to do wildcard queries like the ones asked in this thread's
> > headline but which is much easier to do through the use of supercolumns=
.
> >
> > On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo
> > <edlinuxguru@gmail.com>wrote:
> >
> >> The use case in question was: Only accessing some columns.
> >>
> >> Even if that is not the case:
> >>
> >> SuperColumns: 1 extra level of nesting
> >> Composite Colunns: Arbitrary levels of nesting
> >>
> >> SuperColumns: More overhead (space on disk) then using your own
> delimiter
> >> '_'
> >> SuperColumns: Likely going to be replaced in future c* version behind
> >> the scenes by composite columns anyway
> >> SuperColumns: Usually an afterthought for API developers, (support for
> >> them comes "later")
> >> SuperColumns: Almost always utilized incorrectly by users, users speak
> >> of '10%' performance gains after they switch away from them.
> >>
> >> There are some (a small % of cases) where SuperColumns are a better
> >> choice, but this is rare. With composites and concatenating columns
> >> they have no great purpose any more, (bad analogy coming!) like a
> >> mechanical type writer.
> >>
> >> On 12/29/11, Philippe <watcherfr@gmail.com> wrote:
> >> > Would you stand by that statement in case all colums inside the supe=
r
> >> > column need to be read?  Why?
> >> >
> >> > Thanks
> >> > Le 28 d=E9c. 2011 19:26, "Edward Capriolo" <edlinuxguru@gmail.com> a
> >> =E9crit :
> >> >
> >> >> Super columns have the same fundamental problem and perform worse i=
n
> >> >> general. So switching from composites to super columns is NEVER a
> good
> >> >> idea.
> >> >>
> >> >>
> >> >> On Wed, Dec 28, 2011 at 1:19 PM, Aditya <adynnn@gmail.com> wrote:
> >> >>
> >> >>> Since I have around 20 items to query, I guess making 20 queries t=
o
> >> >>> retrieve activities by all followies on all of those 20 columns
> would
> >> too
> >> >>> inefficient, so to take the advantage of more efficient queries, a=
re
> >> >>> supercolumns recommended for this case ? Anyways, in case I use
> >> >>> supercolumns, I need to retrieve the entire supercolumn at any poi=
nt
> >> >>> of
> >> >>> time & I am writing subcolumn(s) to the supercolumn at different
> times
> >> >>> not
> >> >>> at once.
> >> >>>
> >> >>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
> >> >>> <edlinuxguru@gmail.com>wrote:
> >> >>>
> >> >>>> You need to execute one get slice operation for each item id or i=
f
> >> >>>> the
> >> >>>> row is not large , you can try one large get slice on the entire
> row
> >> and
> >> >>>> deal with the results client side.
> >> >>>>
> >> >>>> If you try method 1 When doing slices on composites you can set t=
he
> >> >>>> start inclusive or exclusive values to get only the column you wa=
nt
> >> and
> >> >>>> not
> >> >>>> some extra columns up to slice range size.
> >> >>>>
> >> >>>>
> >> >>>> On Tuesday, December 27, 2011, Aditya <adynnn@gmail.com> wrote:
> >> >>>> > I need to store data of all activities by user's followies in
> >> >>>> > single
> >> >>>> row. I am trying to do that making use of composite column names
> in a
> >> >>>> single user specific row named 'rowX'.
> >> >>>> > On any activity by a user's followie on an item, a column is
> stored
> >> in
> >> >>>> 'rowX'. The column has a composite type column name made up of
> >> >>>> itemId+userId (which makes it unique col. name) in rowX. (& colum=
n
> >> value
> >> >>>> contains the activity data related to that item by that followie)
> >> >>>> >
> >> >>>> > Now I want to retrieve activity by all users on a list of items=
.
> So
> >> I
> >> >>>> need to retrieve all composite columns with composite's first
> >> component
> >> >>>> matching the itemId. Is it possible to do such a query to
> Cassandra ?
> >> I
> >> >>>> am
> >> >>>> using Hector.
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >
>

--e89a8f3bad475cb2d704b54c13cd
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I currently have=A0<div>scf[c1][sc1]=3Dvalue<br>scf[c1][sc2]=3Dvalue<br>...=
</div><div><div class=3D"gmail_quote"><div>scf[c2][sc1]=3Dvalue<br>scf[c2][=
sc2]=3Dvalue<br><div>scf[c2][sc3]=3Dvalue<br></div>scf[c2][sc4]=3Dvalue<br>=
</div><div><br>
</div><div>99% of the time, I do multiget super slices: for multiple keys, =
I query for columns=A0explicitly=A0c1,c2,c10,c12</div><div>1% of the time, =
I do a multigetrange superslice where for multiple keys, I query for a rang=
e of super columns</div>
<div>As Tyler said, it can be done by specifying supercolumns in the slice =
predicate, it will=A0implicitly=A0return all its columns. I use Hector and =
it works great.</div><div><br></div><div>Now interestingly enough, column n=
ames sc1, sc2, sc3 are in fact home-made composite columns.</div>
<div><br></div><div>I could and would switch to full composite columns beca=
use I am fishing for every drop of performance I can. However, I would need=
 &quot;<span class=3D"Apple-style-span" style>Letting multiget_slice accept=
 multiple SlicePredicates per key could also accomplish this.&quot;</span><=
/div>
<div><span class=3D"Apple-style-span" style>Can anyone on the dev team comm=
ent on doing this ? Is it a no-no ?</span></div><div><span class=3D"Apple-s=
tyle-span" style><br></span></div><div><span class=3D"Apple-style-span" sty=
le>Thanks</span></div>
<div><br></div></div><div class=3D"gmail_quote">2011/12/29 Edward Capriolo =
<span dir=3D"ltr">&lt;<a href=3D"mailto:edlinuxguru@gmail.com">edlinuxguru@=
gmail.com</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"marg=
in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hum...<br>
<br>
Do you have this?<br>
scf [b][1][a]=3Dvalue<br>
scf [b][1][x]=3Dvalue<br>
scf [b][7][b]=3Dvalue<br>
<br>
and you want to slice:<br>
scf [b][1][*]<br>
<br>
Which would result in<br>
<br>
scf [b][1][a]=3Dvalue<br>
scf [b][1][x]=3Dvalue<br>
<br>
?<br>
<br>
The composite version of this would be:<br>
cf [b][1:a]=3Dvalue<br>
cf [b][1:x]=3Dvalue<br>
cf [b][7:b]=3Dvalue<br>
<br>
I am not sure exactly what you are doing because A SlicePredicate<br>
takes either a list of columns or a SliceRange. A ColumnPath takes a<br>
Single SuperColumn.<br>
<br>
I do not see how this is done with Columns or SuperColumns. Maybe you<br>
can provide a code snippet and/or some sample data?<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On 12/29/11, Aditya &lt;<a href=3D"mailto:adynnn@gmail.com">adynnn@gmail.co=
m</a>&gt; wrote:<br>
&gt; @Edward: Perhaps you missed to notice that I need to always retrieve &=
#39;all<br>
&gt; columns&#39; under the supercolumn at any time.. and as per my query<b=
r>
&gt; requirements if I use composite columns instead of supercolumns then i=
t is<br>
&gt; impossible to do wildcard queries like the ones asked in this thread&#=
39;s<br>
&gt; headline but which is much easier to do through the use of supercolumn=
s.<br>
&gt;<br>
&gt; On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo<br>
&gt; &lt;<a href=3D"mailto:edlinuxguru@gmail.com">edlinuxguru@gmail.com</a>=
&gt;wrote:<br>
&gt;<br>
&gt;&gt; The use case in question was: Only accessing some columns.<br>
&gt;&gt;<br>
&gt;&gt; Even if that is not the case:<br>
&gt;&gt;<br>
&gt;&gt; SuperColumns: 1 extra level of nesting<br>
&gt;&gt; Composite Colunns: Arbitrary levels of nesting<br>
&gt;&gt;<br>
&gt;&gt; SuperColumns: More overhead (space on disk) then using your own de=
limiter<br>
&gt;&gt; &#39;_&#39;<br>
&gt;&gt; SuperColumns: Likely going to be replaced in future c* version beh=
ind<br>
&gt;&gt; the scenes by composite columns anyway<br>
&gt;&gt; SuperColumns: Usually an afterthought for API developers, (support=
 for<br>
&gt;&gt; them comes &quot;later&quot;)<br>
&gt;&gt; SuperColumns: Almost always utilized incorrectly by users, users s=
peak<br>
&gt;&gt; of &#39;10%&#39; performance gains after they switch away from the=
m.<br>
&gt;&gt;<br>
&gt;&gt; There are some (a small % of cases) where SuperColumns are a bette=
r<br>
&gt;&gt; choice, but this is rare. With composites and concatenating column=
s<br>
&gt;&gt; they have no great purpose any more, (bad analogy coming!) like a<=
br>
&gt;&gt; mechanical type writer.<br>
&gt;&gt;<br>
&gt;&gt; On 12/29/11, Philippe &lt;<a href=3D"mailto:watcherfr@gmail.com">w=
atcherfr@gmail.com</a>&gt; wrote:<br>
&gt;&gt; &gt; Would you stand by that statement in case all colums inside t=
he super<br>
&gt;&gt; &gt; column need to be read? =A0Why?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Thanks<br>
&gt;&gt; &gt; Le 28 d=E9c. 2011 19:26, &quot;Edward Capriolo&quot; &lt;<a h=
ref=3D"mailto:edlinuxguru@gmail.com">edlinuxguru@gmail.com</a>&gt; a<br>
&gt;&gt; =E9crit :<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; Super columns have the same fundamental problem and perfo=
rm worse in<br>
&gt;&gt; &gt;&gt; general. So switching from composites to super columns is=
 NEVER a good<br>
&gt;&gt; &gt;&gt; idea.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; On Wed, Dec 28, 2011 at 1:19 PM, Aditya &lt;<a href=3D"ma=
ilto:adynnn@gmail.com">adynnn@gmail.com</a>&gt; wrote:<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt;&gt; Since I have around 20 items to query, I guess making=
 20 queries to<br>
&gt;&gt; &gt;&gt;&gt; retrieve activities by all followies on all of those =
20 columns would<br>
&gt;&gt; too<br>
&gt;&gt; &gt;&gt;&gt; inefficient, so to take the advantage of more efficie=
nt queries, are<br>
&gt;&gt; &gt;&gt;&gt; supercolumns recommended for this case ? Anyways, in =
case I use<br>
&gt;&gt; &gt;&gt;&gt; supercolumns, I need to retrieve the entire supercolu=
mn at any point<br>
&gt;&gt; &gt;&gt;&gt; of<br>
&gt;&gt; &gt;&gt;&gt; time &amp; I am writing subcolumn(s) to the supercolu=
mn at different times<br>
&gt;&gt; &gt;&gt;&gt; not<br>
&gt;&gt; &gt;&gt;&gt; at once.<br>
&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt;&gt; On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo<br>
&gt;&gt; &gt;&gt;&gt; &lt;<a href=3D"mailto:edlinuxguru@gmail.com">edlinuxg=
uru@gmail.com</a>&gt;wrote:<br>
&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt;&gt;&gt; You need to execute one get slice operation for e=
ach item id or if<br>
&gt;&gt; &gt;&gt;&gt;&gt; the<br>
&gt;&gt; &gt;&gt;&gt;&gt; row is not large , you can try one large get slic=
e on the entire row<br>
&gt;&gt; and<br>
&gt;&gt; &gt;&gt;&gt;&gt; deal with the results client side.<br>
&gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt;&gt;&gt; If you try method 1 When doing slices on composit=
es you can set the<br>
&gt;&gt; &gt;&gt;&gt;&gt; start inclusive or exclusive values to get only t=
he column you want<br>
&gt;&gt; and<br>
&gt;&gt; &gt;&gt;&gt;&gt; not<br>
&gt;&gt; &gt;&gt;&gt;&gt; some extra columns up to slice range size.<br>
&gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt;&gt;&gt; On Tuesday, December 27, 2011, Aditya &lt;<a href=
=3D"mailto:adynnn@gmail.com">adynnn@gmail.com</a>&gt; wrote:<br>
&gt;&gt; &gt;&gt;&gt;&gt; &gt; I need to store data of all activities by us=
er&#39;s followies in<br>
&gt;&gt; &gt;&gt;&gt;&gt; &gt; single<br>
&gt;&gt; &gt;&gt;&gt;&gt; row. I am trying to do that making use of composi=
te column names in a<br>
&gt;&gt; &gt;&gt;&gt;&gt; single user specific row named &#39;rowX&#39;.<br=
>
&gt;&gt; &gt;&gt;&gt;&gt; &gt; On any activity by a user&#39;s followie on =
an item, a column is stored<br>
&gt;&gt; in<br>
&gt;&gt; &gt;&gt;&gt;&gt; &#39;rowX&#39;. The column has a composite type c=
olumn name made up of<br>
&gt;&gt; &gt;&gt;&gt;&gt; itemId+userId (which makes it unique col. name) i=
n rowX. (&amp; column<br>
&gt;&gt; value<br>
&gt;&gt; &gt;&gt;&gt;&gt; contains the activity data related to that item b=
y that followie)<br>
&gt;&gt; &gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt;&gt;&gt; &gt; Now I want to retrieve activity by all users=
 on a list of items. So<br>
&gt;&gt; I<br>
&gt;&gt; &gt;&gt;&gt;&gt; need to retrieve all composite columns with compo=
site&#39;s first<br>
&gt;&gt; component<br>
&gt;&gt; &gt;&gt;&gt;&gt; matching the itemId. Is it possible to do such a =
query to Cassandra ?<br>
&gt;&gt; I<br>
&gt;&gt; &gt;&gt;&gt;&gt; am<br>
&gt;&gt; &gt;&gt;&gt;&gt; using Hector.<br>
&gt;&gt; &gt;&gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt;&gt;<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt;<br>
&gt;<br>
</div></div></blockquote></div><br></div>

--e89a8f3bad475cb2d704b54c13cd--