From user-return-22398-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Nov 17 05:35:03 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E44D67B37 for ; Thu, 17 Nov 2011 05:35:03 +0000 (UTC) Received: (qmail 6234 invoked by uid 500); 17 Nov 2011 05:35:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 6183 invoked by uid 500); 17 Nov 2011 05:35:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 6155 invoked by uid 99); 17 Nov 2011 05:34:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2011 05:34:57 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rajkumar.w93@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2011 05:34:50 +0000 Received: by bkbzv15 with SMTP id zv15so1770714bkb.31 for ; Wed, 16 Nov 2011 21:34:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=WRgZX9rleIdk8K78QjJ+D9vNSySKSiCsdTereC4YIzg=; b=ecZpKGuxsGT+s3CQr0GZha9Ftij9qDljbT5kvSxHL26sM6Kks5JNzOGR3xOozeLaFc a0hR1ytchTZeIt8PC9tlfZSOYAtNcSz0YQO/XmhGeBJ8ra/PnOnQcW5Sygs1mi3PExUY I/QOR9bUJ+33QhpxpuVZwN9Usm5w7AUBGK6JQ= MIME-Version: 1.0 Received: by 10.204.156.133 with SMTP id x5mr31689914bkw.87.1321508069017; Wed, 16 Nov 2011 21:34:29 -0800 (PST) Sender: rajkumar.w93@gmail.com Received: by 10.223.2.148 with HTTP; Wed, 16 Nov 2011 21:34:28 -0800 (PST) In-Reply-To: References: <1729105504170368002@unknownmsgid> Date: Thu, 17 Nov 2011 11:04:28 +0530 X-Google-Sender-Auth: 6QD8iUMvWvKYgi8u2AOlJN-M0XY Message-ID: Subject: Re: Seeking advice on Schema and Caching From: Aditya To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015175902a60586fb04b1e793f7 X-Virus-Checked: Checked by ClamAV on apache.org --0015175902a60586fb04b1e793f7 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Nov 17, 2011 at 10:25 AM, samal wrote: > >> Edanuff + Beautiful People > > I think "row cache" could be the best fit but it can take resource > depending on row size. It will only touch disk once (first time) in case of > SST, rest of the req for that row will be served from memory. Try > increasing row cache size and decreasing save period to appropriate value > *Row cache size / save period in seconds: *200/30 > Very nice . I didn't knew that we could even have the "save period" setting as well. This makes the job easier. Now can reduce the period to 30 sec & put the row cache size to a good enough limit. Thanks :) Yes there may be rows that will be very wide, I'll need to figure if I can do something better for that, but even this wont be problematic until my cache period is reasonable and cache size is set to a good limit, right ? >> one catch this is only good for small size row, as your one row contain > all entry with first 3 similar char, this can happen that one row could > become very large while other remain very thin. > eg: > many ppl can have aditya name > adi{ > {tya,1} > . > . > } > > but only few ppl will have name with x or y. > > > > On Thu, Nov 17, 2011 at 3:29 AM, Aditya wrote: > >> Thanks to samal who pointed to look at the composite columns. I am now >> using composite columns names containing username+userId & valueless >> column. Thus column names are now unique even for users with same name as >> userId is also attached to the same composite col name. Thus the >> supercolumn issue is resolved. >> But I am still seeking advice some on the caching strategy for these >> rows. Since while a user is doing the search, the DB will be >> queried multiple times because I 'm not keeping the retrieved columns in >> the application layer. Thus I am thinking of caching this row so that >> the further queries be served through the cache. However the important >> point here is that I am using very fewer resources for this cache so that >> the rows remain in cache for a very short time so as to serve the needs >> only for a single search time interval like max 30 seconds. Is this >> approach correct.? That way I wont be putting unneccessary data in cache >> for a long time thus saving resources for other needs. >> >> >> On Wed, Nov 16, 2011 at 11:20 AM, samal wrote: >> >>> I think you can but I am not sure, I haven't tried that yet, Nothing >>> harm in keeping value also it will be read in single query only. >>> >>> In 2nd case, yes 2 or more query required to get specific user details. >>> As username is map to user_id's key(unique like UUID) and user_id key store >>> actual details. >>> >>> >>> On Wed, Nov 16, 2011 at 11:10 AM, Aditya Narayan wrote: >>> >>>> Regarding the first option that you suggested through composite >>>> columns, can I store the username & id both in the column name and keep the >>>> column valueless? >>>> Will I be able to retrieve both the username and id from the composite >>>> col name ? >>>> >>>> Thanks a lot >>>> >>>> On Wed, Nov 16, 2011 at 10:56 AM, Aditya Narayan wrote: >>>> >>>>> Got the first option that you suggested. >>>>> >>>>> However, In the second one, are you suggested to use, for e.g, >>>>> key='Marcos' & store cols, for all users of that name, containing userId >>>>> inside that row. That way it would have to read multiple rows while user is >>>>> doing a single search. >>>>> >>>>> >>>>> On Wed, Nov 16, 2011 at 10:47 AM, samal wrote: >>>>> >>>>>> >>>>>> > I need to add 'search users' functionality to my application. (The >>>>>>>> trigger for fetching searched items(like google instant search) is made >>>>>>>> when 3 letters have been typed in). >>>>>>>> > >>>>>>>> > For this, I make a CF with String type keys. Each such key is >>>>>>>> made of first 3 letters of a user's name. >>>>>>>> > >>>>>>>> > Thus all names starting with 'Mar-' are stored in single row >>>>>>>> (with key="Mar"). >>>>>>>> > The column names are framed as remaining letters of the names. >>>>>>>> Thus, a name 'Marcos' will be stored within rowkey "Mar" & col name "cos". >>>>>>>> The id will be stored as column value. Since there could be many users with >>>>>>>> same name. Thus I would have multple userIds(of users named "Marcos") to be >>>>>>>> stored inside columnname "cos" under key "Mar". Thus, >>>>>>>> > >>>>>>>> > 1. Supercolumn seems to be a better fit for my use case(so that >>>>>>>> ids of users with same name may fit as sub-columns inside a super-column) >>>>>>>> but since supercolumns are not encouraged thus I want to use an alternative >>>>>>>> schema for this usecase if possible. Could you suggest some ideas on this ? >>>>>>>> > >>>>>>>> >>>>>>> >>>>>> Aditya, >>>>>> >>>>>> Have you any given thought on Composite columns [1]. I think it can >>>>>> help you solve your problem of multiple user with same name. >>>>>> >>>>>> mar:{ >>>>>> {cos,unique_user_id}:unique_user_id, >>>>>> {cos,1}:1, >>>>>> {cos,2}:2, >>>>>> {cos,3}:3, >>>>>> >>>>>> // {utf8,timeUUID}:timeUUID, >>>>>> } >>>>>> OR >>>>>> you can try wide rows indexing user name to ID's >>>>>> >>>>>> marcos{ >>>>>> user1:' ', >>>>>> user2:' ', >>>>>> user3:' ' >>>>>> } >>>>>> >>>>>> [1]http://www.slideshare.net/edanuff/indexing-in-cassandra >>>>>> >>>>>> >>>>> >>>> >>> >> > --0015175902a60586fb04b1e793f7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Thu, Nov 17, 2011 at 10:25 AM, samal = <samalgorai@gm= ail.com> wrote:
>> Edanuff + Beautiful People

I think "row cache" co= uld be the best fit but it can take resource depending on row size. It will= only touch disk once (first time) in case of SST, rest of the req for that= row will be served from memory. Try increasing row cache size and decreasi= ng save period to appropriate value
Row cache size / save period in seconds: 200/30
=A0
Very nice . I didn't knew that we could even have the &q= uot;save period"=A0setting as well. This makes the job easier. Now can= reduce the period to 30 sec & put the row cache size to a good enough = limit. Thanks :)

Yes there may be rows that=A0will=A0be very wide, I'= ;ll need to figure if I can do something better for that, but even this won= t be problematic until my cache period is reasonable and cache size is set = to a good limit, right ?

>> one catch this is o= nly good for small size row, as your one row contain all entry with first 3= similar char, this can happen that one row could become very large while o= ther remain very thin.
eg:
=A0many ppl can have aditya name
adi{
{tya,1}
.
.
}<= br>
but only few ppl will have name with x or y.



On Thu, Nov 17, 2011 at 3:29 AM, Aditya <ady= nnn@gmail.com> wrote:
Thanks to samal who pointed to look at the composite columns. I am now usin= g composite columns names containing username+userId & valueless column= . Thus column names are now unique even for users with same name as userId = is also attached to the same composite col name. Thus the supercolumn issue= is resolved.=A0
But I am still seeking advice some on the caching strategy for these rows. = Since while a user is doing the search, the DB will be queried=A0multiple t= imes because =A0I 'm not keeping the retrieved columns in the applicati= on layer. Thus I am thinking of caching this row so that the=A0further=A0qu= eries be served through the cache. However the important point here is that= I am using very fewer resources for this cache so that the rows remain in = cache for a very short time so as to serve the needs only for a single sear= ch time interval like max 30 seconds. Is this approach correct.? That way I= wont be putting unneccessary data in cache for a long time thus saving res= ources for other needs.=A0


On Wed, Nov 16, 2011 at 11:20 AM, samal <samalgorai@gmail.com> wrote:
I think you can but I am not sure, I haven't tried that yet, Nothing ha= rm in keeping value also it will be read in single query only.

In 2= nd case, yes 2 or more query required to get specific user details. As user= name is map to user_id's key(unique like UUID) and user_id key store ac= tual details.


On Wed, Nov 16, 2011 at 11:10 AM, Aditya Nar= ayan <adynnn@gmail.com> wrote:
Regarding the first option that you suggested through composite columns, ca= n I store the username & id both in the column name and keep the column= valueless?
Will I be able to retrieve both the username and id from the= composite col name ?

Thanks a lot

On Wed, Nov 16, 2011 at 10:56 AM, Aditya Narayan <= adynnn@gmail.com&= gt; wrote:
Got the first option that you suggested.

However, In the= second one, are you suggested to use, for e.g, key=3D'Marcos' &= ; store cols, for all users of that name, containing userId inside that row= . That way it would have to read multiple rows while user is doing a single= search.


On Wed, Nov 16, 2011 at 10:47 AM, samal <samalgorai@gmail.com> wrote:

> I need to add 'search users' functionality to my application. = (The trigger for fetching searched items(like google instant search) is mad= e when 3 letters have been typed in).
>
> For this, I make a CF with String type keys. Each such key is made of = first 3 letters of a user's name.
>
> Thus all names starting with 'Mar-' are stored in single row (= with key=3D"Mar").
> The column names are framed as remaining letters of the names. Thus, a= name 'Marcos' will be stored within rowkey "Mar" & c= ol name "cos". The id will be stored as column value. Since there= could be many users with same name. Thus I would have multple userIds(of u= sers named "Marcos") to be stored inside columnname "cos&quo= t; under key "Mar". Thus,
>
> 1. Supercolumn seems to be a better fit for my use case(so that ids of= users with same name may fit as sub-columns inside a super-column) but sin= ce supercolumns are not encouraged thus I want to use an alternative schema= for this usecase if possible. Could you suggest some ideas on this ?
>

Aditya,

Have you any given thought on Composite columns [1= ]. I think it can help you solve your problem of multiple user with same na= me.

mar:{
=A0 {cos,unique_user_id}:unique_user_id,
=A0 {cos,1}:1,
= =A0 {cos,2}:2,
=A0 {cos,3}:3,

//=A0 {utf8,timeUUID}:timeUUID,}
OR
you can try wide rows indexing user name to ID's

mar= cos{
=A0=A0 user1:' ',
=A0=A0 user2:' ',
=A0=A0 user3:'= ; '
}

[1]http://www.slideshare.net/edanuff/indexin= g-in-cassandra







--0015175902a60586fb04b1e793f7--