Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of mac.miklas@googlemail.com
 designates 209.85.214.172 as permitted sender)
MIME-Version: 1.0
Reply-To: mac.miklas@gmail.com
In-Reply-To: <4F73A2D3.10507@gmail.com>
References: 
 <CADwHx2qvOXD1dvEFf3XCA5m1G0pw6ueeJh+P6TkEA0jF0TVn5g@mail.gmail.com>
	<4F7175DC.5040804@gmail.com>
	<CALk=J5_USyq8pCaqoZj+Yi4Qb2-vshpH=u9KHwgtTUhwFB9neA@mail.gmail.com>
	<4F71D932.7090102@gmail.com>
	<CALk=J5_bEFrFLMckdg+Ai-6EKsCpjN7PGeB1=Xoraxs1-Z=BOA@mail.gmail.com>
	<4F73A2D3.10507@gmail.com>
Date: Thu, 29 Mar 2012 07:35:46 +0200
Message-ID: 
 <CALk=J590YtynanV6vY70C04k11R-bDQr4vWe2v5Pko2mKhXoBg@mail.gmail.com>
Subject: Re: Schema advice/help
From: Maciej Miklas <mac.miklas@googlemail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d0444ee1f8b676f04bc5b184a

--f46d0444ee1f8b676f04bc5b184a
Content-Type: text/plain; charset=UTF-8

correct - I see also no other solution for this problem

On Thu, Mar 29, 2012 at 1:46 AM, Guy Incognito <dnd1066@gmail.com> wrote:

>  well, no.  my assumption is that he knows what the 5 itemTypes (or
> appropriate corresponding ids) are, so he can do a known 5-rowkey lookup.
> if he does not know, then agreed, my proposal is not a great fit.
>
> could do (as originally suggested)
>
> userId -> itemType:activityId
>
> if you want to keep everything in the same row (again assumes that you
> know what the itemTypes are).  but then you can't really do a multiget, you
> have to do 5 separate slice queries, one for each item type.
>
> can also do some wacky stuff around maintaining a row that explicitly only
> holds the last 10 items by itemType (meaning you have to delete the oldest
> one everytime you insert a new one), but that prolly requires read-on-write
> etc and is a lot messier.  and you will prolly need to worry about the case
> where you (transiently) have more than 10 'latest' items for a single
> itemType.
>
> On 28/03/2012 09:49, Maciej Miklas wrote:
>
> yes - but anyway in your example you need "key range quey" and that
> requires OOP, right?
>
> On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito <dnd1066@gmail.com> wrote:
>
>>  multiget does not require OPP.
>>
>> On 27/03/2012 09:51, Maciej Miklas wrote:
>>
>> multiget would require Order Preserving Partitioner, and this can lead to
>> unbalanced ring and hot spots.
>>
>> Maybe you can use secondary index on "itemtype" - is must have small
>> cardinality:
>> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>>
>>
>>
>> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dnd1066@gmail.com>wrote:
>>
>>> without the ability to do disjoint column slices, i would probably use 5
>>> different rows.
>>>
>>> userId:itemType -> activityId
>>>
>>> then it's a multiget slice of 10 items from each of your 5 rows.
>>>
>>>
>>> On 26/03/2012 22:16, Ertio Lew wrote:
>>>
>>>> I need to store activities by each user, on 5 items types. I always
>>>> want to read last 10 activities on each item type, by a user (ie, total
>>>> activities to read at a time =50).
>>>>
>>>> I am wanting to store these activities in a single row for each user so
>>>> that they can be retrieved in single row query, since I want to read all
>>>> the last 10 activities on each item.. I am thinking of creating composite
>>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>>> value) but then, I don't see about how to read the last 10 activities from
>>>> all itemtypes.
>>>>
>>>> Any ideas about schema to do this better way ?
>>>>
>>>
>>>
>>
>>
>
>

--f46d0444ee1f8b676f04bc5b184a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

correct - I see also no other solution for this problem<br><br><div class=
=3D"gmail_quote">On Thu, Mar 29, 2012 at 1:46 AM, Guy Incognito <span dir=
=3D"ltr">&lt;<a href=3D"mailto:dnd1066@gmail.com">dnd1066@gmail.com</a>&gt;=
</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000">
    well, no.=C2=A0 my assumption is that he knows what the 5 itemTypes (or
    appropriate corresponding ids) are, so he can do a known 5-rowkey
    lookup.=C2=A0 if he does not know, then agreed, my proposal is not a
    great fit.<br>
    <br>
    could do (as originally suggested)<br>
    <br>
    userId -&gt; itemType:activityId<br>
    <br>
    if you want to keep everything in the same row (again assumes that
    you know what the itemTypes are).=C2=A0 but then you can&#39;t really d=
o a
    multiget, you have to do 5 separate slice queries, one for each item
    type.<br>
    <br>
    can also do some wacky stuff around maintaining a row that
    explicitly only holds the last 10 items by itemType (meaning you
    have to delete the oldest one everytime you insert a new one), but
    that prolly requires read-on-write etc and is a lot messier.=C2=A0 and
    you will prolly need to worry about the case where you (transiently)
    have more than 10 &#39;latest&#39; items for a single itemType.<br>
    <br>
    On 28/03/2012 09:49, Maciej Miklas wrote:
    <blockquote type=3D"cite">yes - but anyway in your example you need &qu=
ot;key range
      quey&quot; and that requires OOP, right?<br>
      <br>
      <div class=3D"gmail_quote">On Tue, Mar 27, 2012 at 5:13 PM, Guy
        Incognito <span dir=3D"ltr">&lt;<a href=3D"mailto:dnd1066@gmail.com=
" target=3D"_blank">dnd1066@gmail.com</a>&gt;</span>
        wrote:<br>
        <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border=
-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor=3D"#FFFFFF" text=3D"#000000"> multiget does not
            require OPP.<br>
            <br>
            On 27/03/2012 09:51, Maciej Miklas wrote:
            <blockquote type=3D"cite">multiget would require Order
              Preserving Partitioner, and this can lead to unbalanced
              ring and hot spots.<br>
              <br>
              Maybe you can use secondary index on &quot;itemtype&quot; - i=
s must
              have small cardinality: <a href=3D"http://pkghosh.wordpress.c=
om/2011/03/02/cassandra-secondary-index-patterns/" target=3D"_blank">http:/=
/pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/</a><b=
r>

              <br>
              <br>
              <br>
              <div class=3D"gmail_quote">On Tue, Mar 27, 2012 at 10:10 AM,
                Guy Incognito <span dir=3D"ltr">&lt;<a href=3D"mailto:dnd10=
66@gmail.com" target=3D"_blank">dnd1066@gmail.com</a>&gt;</span>
                wrote:<br>
                <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8e=
x;border-left:1px #ccc solid;padding-left:1ex">
                  without the ability to do disjoint column slices, i
                  would probably use 5 different rows.<br>
                  <br>
                  userId:itemType -&gt; activityId<br>
                  <br>
                  then it&#39;s a multiget slice of 10 items from each of
                  your 5 rows.
                  <div>
                    <div><br>
                      <br>
                      On 26/03/2012 22:16, Ertio Lew wrote:<br>
                      <blockquote class=3D"gmail_quote" style=3D"margin:0 0=
 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I need to store
                        activities by each user, on 5 items types. I
                        always want to read last 10 activities on each
                        item type, by a user (ie, total activities to
                        read at a time =3D50).<br>
                        <br>
                        I am wanting to store these activities in a
                        single row for each user so that they can be
                        retrieved in single row query, since I want to
                        read all the last 10 activities on each item.. I
                        am thinking of creating composite names
                        appending &quot;itemtype&quot; : &quot;activityId&q=
uot;(activityId
                        is just timestamp value) but then, I don&#39;t see
                        about how to read the last 10 activities from
                        all itemtypes.<br>
                        <br>
                        Any ideas about schema to do this better way ?<br>
                      </blockquote>
                      <br>
                    </div>
                  </div>
                </blockquote>
              </div>
              <br>
            </blockquote>
            <br>
          </div>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </div>

</blockquote></div><br>

--f46d0444ee1f8b676f04bc5b184a--