Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C87929B61 for ; Thu, 29 Mar 2012 05:36:15 +0000 (UTC) Received: (qmail 33924 invoked by uid 500); 29 Mar 2012 05:36:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 33679 invoked by uid 500); 29 Mar 2012 05:36:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 33659 invoked by uid 99); 29 Mar 2012 05:36:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2012 05:36:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mac.miklas@googlemail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-ob0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2012 05:36:07 +0000 Received: by obbta17 with SMTP id ta17so35790obb.31 for ; Wed, 28 Mar 2012 22:35:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=VOolTjt3fMymgkYxUdNxLuNznF5zTczR0d+WFRrp0N4=; b=e9k1a+YBWd7sPlGhWEunpU76r0MZU7HYGJo38RORdbuVESgBHGD6lTC+/qb6+xPbin n5VbzaJ9u0kSh2OPATYYZEKlir52edNekkUUaRUPRPbDL0C/y9SPENMdI0o1VWfKz49z k0JGEpwWC5x7XLBGJmrcO2GP1DD3F7gBl2xVMQejMLJ8krEPoWubSEqy/qI5E7s2CkbU Zhnw2k6wZTkWhonEJc0DDU4nQzqYeTH7HXa/h0WFmCarFtEmrP/tR/y4IUmkSHZRsz42 /EGCxh1uIbLfxqyCWWjxqSFhJhjUkoGfvNfXky3x9/k0YSWpHXH+DqU+iFrKGRbVzdJ4 m2Aw== MIME-Version: 1.0 Received: by 10.182.119.101 with SMTP id kt5mr41818429obb.70.1332999346677; Wed, 28 Mar 2012 22:35:46 -0700 (PDT) Received: by 10.182.121.10 with HTTP; Wed, 28 Mar 2012 22:35:46 -0700 (PDT) Reply-To: mac.miklas@gmail.com In-Reply-To: <4F73A2D3.10507@gmail.com> References: <4F7175DC.5040804@gmail.com> <4F71D932.7090102@gmail.com> <4F73A2D3.10507@gmail.com> Date: Thu, 29 Mar 2012 07:35:46 +0200 Message-ID: Subject: Re: Schema advice/help From: Maciej Miklas To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d0444ee1f8b676f04bc5b184a X-Virus-Checked: Checked by ClamAV on apache.org --f46d0444ee1f8b676f04bc5b184a Content-Type: text/plain; charset=UTF-8 correct - I see also no other solution for this problem On Thu, Mar 29, 2012 at 1:46 AM, Guy Incognito wrote: > well, no. my assumption is that he knows what the 5 itemTypes (or > appropriate corresponding ids) are, so he can do a known 5-rowkey lookup. > if he does not know, then agreed, my proposal is not a great fit. > > could do (as originally suggested) > > userId -> itemType:activityId > > if you want to keep everything in the same row (again assumes that you > know what the itemTypes are). but then you can't really do a multiget, you > have to do 5 separate slice queries, one for each item type. > > can also do some wacky stuff around maintaining a row that explicitly only > holds the last 10 items by itemType (meaning you have to delete the oldest > one everytime you insert a new one), but that prolly requires read-on-write > etc and is a lot messier. and you will prolly need to worry about the case > where you (transiently) have more than 10 'latest' items for a single > itemType. > > On 28/03/2012 09:49, Maciej Miklas wrote: > > yes - but anyway in your example you need "key range quey" and that > requires OOP, right? > > On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito wrote: > >> multiget does not require OPP. >> >> On 27/03/2012 09:51, Maciej Miklas wrote: >> >> multiget would require Order Preserving Partitioner, and this can lead to >> unbalanced ring and hot spots. >> >> Maybe you can use secondary index on "itemtype" - is must have small >> cardinality: >> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/ >> >> >> >> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito wrote: >> >>> without the ability to do disjoint column slices, i would probably use 5 >>> different rows. >>> >>> userId:itemType -> activityId >>> >>> then it's a multiget slice of 10 items from each of your 5 rows. >>> >>> >>> On 26/03/2012 22:16, Ertio Lew wrote: >>> >>>> I need to store activities by each user, on 5 items types. I always >>>> want to read last 10 activities on each item type, by a user (ie, total >>>> activities to read at a time =50). >>>> >>>> I am wanting to store these activities in a single row for each user so >>>> that they can be retrieved in single row query, since I want to read all >>>> the last 10 activities on each item.. I am thinking of creating composite >>>> names appending "itemtype" : "activityId"(activityId is just timestamp >>>> value) but then, I don't see about how to read the last 10 activities from >>>> all itemtypes. >>>> >>>> Any ideas about schema to do this better way ? >>>> >>> >>> >> >> > > --f46d0444ee1f8b676f04bc5b184a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable correct - I see also no other solution for this problem

On Thu, Mar 29, 2012 at 1:46 AM, Guy Incognito <dnd1066@gmail.com>= wrote:
=20 =20 =20
well, no.=C2=A0 my assumption is that he knows what the 5 itemTypes (or appropriate corresponding ids) are, so he can do a known 5-rowkey lookup.=C2=A0 if he does not know, then agreed, my proposal is not a great fit.

could do (as originally suggested)

userId -> itemType:activityId

if you want to keep everything in the same row (again assumes that you know what the itemTypes are).=C2=A0 but then you can't really d= o a multiget, you have to do 5 separate slice queries, one for each item type.

can also do some wacky stuff around maintaining a row that explicitly only holds the last 10 items by itemType (meaning you have to delete the oldest one everytime you insert a new one), but that prolly requires read-on-write etc and is a lot messier.=C2=A0 and you will prolly need to worry about the case where you (transiently) have more than 10 'latest' items for a single itemType.

On 28/03/2012 09:49, Maciej Miklas wrote:
yes - but anyway in your example you need &qu= ot;key range quey" and that requires OOP, right?

On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito <dnd1066@gmail.com> wrote:
multiget does not require OPP.

On 27/03/2012 09:51, Maciej Miklas wrote:
multiget would require Order Preserving Partitioner, and this can lead to unbalanced ring and hot spots.

Maybe you can use secondary index on "itemtype" - i= s must have small cardinality: http:/= /pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/


On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dnd1066@gmail.com> wrote:
without the ability to do disjoint column slices, i would probably use 5 different rows.

userId:itemType -> activityId

then it's a multiget slice of 10 items from each of your 5 rows.


On 26/03/2012 22:16, Ertio Lew wrote:
I need to store activities by each user, on 5 items types. I always want to read last 10 activities on each item type, by a user (ie, total activities to read at a time =3D50).

I am wanting to store these activities in a single row for each user so that they can be retrieved in single row query, since I want to read all the last 10 activities on each item.. I am thinking of creating composite names appending "itemtype" : "activityId&q= uot;(activityId is just timestamp value) but then, I don't see about how to read the last 10 activities from all itemtypes.

Any ideas about schema to do this better way ?






--f46d0444ee1f8b676f04bc5b184a--