Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ACF3F9661 for ; Wed, 16 Nov 2011 04:22:55 +0000 (UTC) Received: (qmail 21218 invoked by uid 500); 16 Nov 2011 04:22:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 21060 invoked by uid 500); 16 Nov 2011 04:22:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 21050 invoked by uid 99); 16 Nov 2011 04:22:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Nov 2011 04:22:48 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rajkumar.w93@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Nov 2011 04:22:40 +0000 Received: by bkbzv15 with SMTP id zv15so58584bkb.31 for ; Tue, 15 Nov 2011 20:22:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=4RGA22CaDN9q96MKcwe4XSYsd5hNbZwYQPwEn8RnQ90=; b=jampGujbm/fwnNB5UvdFzVpPEGxvmhC9BB8iam0/+aQ8gkY47n8Z69YIFuf1p/bFeF T/gqWlRsi8th+L2S4cgfbfmyxdkpAIWsOtPu4SB7nnReaEwKob9UqEXPfK1Wkxf/Xks6 ebTlgwb/DvAmTAerwpFaTIg64AfcNZYVute0U= MIME-Version: 1.0 Received: by 10.205.118.13 with SMTP id fo13mr19271790bkc.123.1321417339961; Tue, 15 Nov 2011 20:22:19 -0800 (PST) Sender: rajkumar.w93@gmail.com Received: by 10.223.2.148 with HTTP; Tue, 15 Nov 2011 20:22:19 -0800 (PST) In-Reply-To: <1729105504170368002@unknownmsgid> References: <1729105504170368002@unknownmsgid> Date: Wed, 16 Nov 2011 09:52:19 +0530 X-Google-Sender-Auth: 3WsPKq4z6X4ar1hWAUn8DZHH94U Message-ID: Subject: Re: Seeking advice on Schema and Caching From: Aditya Narayan To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cdfcbc62600d204b1d27399 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cdfcbc62600d204b1d27399 Content-Type: text/plain; charset=ISO-8859-1 Hi Ben, Solr, as I understood is for implementing full text search capability within documents, but in my case, as of now I just need to implement search on user names which seems to be easily provided by Cassandra as user names (as column names) may be sorted alphabetically within rows. I am splitting these rows by the first three characters of the username. Thus all user names starting with 'Mar' are stored in a row with key 'Mar'. Column values store the userId of that user. So Cassandra seems to fully satisfy my needs for this. Only issue i m having is how to deal with multiple users of same name. Thus super columns seem to fit appropriately but I really want to avoid them since they are seriously discouraged by everyone. On Wed, Nov 16, 2011 at 3:19 AM, Ben Gambley wrote: > Hi Aditya > > Not sure the best way to do in Cassandra but have you considered using > apache solr - you could then include just the row keys pointing back > to Cassandra where the actual data is. > > Solr seems quite capable of performing google like searches and is fast. > > > > Cheers > Ben > > On 16/11/2011, at 1:50 AM, Aditya Narayan wrote: > > > Hi > > > > I need to add 'search users' functionality to my application. (The > trigger for fetching searched items(like google instant search) is made > when 3 letters have been typed in). > > > > For this, I make a CF with String type keys. Each such key is made of > first 3 letters of a user's name. > > > > Thus all names starting with 'Mar-' are stored in single row (with > key="Mar"). > > The column names are framed as remaining letters of the names. Thus, a > name 'Marcos' will be stored within rowkey "Mar" & col name "cos". The id > will be stored as column value. Since there could be many users with same > name. Thus I would have multple userIds(of users named "Marcos") to be > stored inside columnname "cos" under key "Mar". Thus, > > > > 1. Supercolumn seems to be a better fit for my use case(so that ids of > users with same name may fit as sub-columns inside a super-column) but > since supercolumns are not encouraged thus I want to use an alternative > schema for this usecase if possible. Could you suggest some ideas on this ? > > > > 2. Another thing, I would like to row cache this CF so that when the > user types in the next character & the query is made consequently, then > this row be retrieved from the cache without touching DB. It is expected > while searching for a single username, the query(as a part of making > instantaneous suggestions) will be made at least 2-3 times. One may also > suggest to fetch all the columns starting with queried string to be > retrieved & then filter out at application level but what about just > fecthing the exact no of columns(ids/names of users) I need to show to the > user. Thus instead of keeping all the hundreds of cols in the application > layer what about keeping it within the DB cache.!? > > The space alloted for the cache will be very small so that row remains > in cache for a very short time(enough to serve only for the time duration > while user is making a single search!?) ? > > > > > --000e0cdfcbc62600d204b1d27399 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Ben,

Solr, as I understood is for implementing full t= ext search=A0capability=A0within documents, but in my case, as of now I jus= t need to implement search on=A0user names=A0which seems to be easily provi= ded by=A0Cassandra as user names (as column names) may be sorted alphabetic= ally within rows. I am splitting these rows by the first three characters o= f the username. Thus all user names starting with 'Mar' are stored = in a row with key 'Mar'. =A0Column values store the userId of that = user.

So=A0Cassandra=A0seems to fully satisfy my needs for th= is. Only issue i m having is how to deal with multiple users of same name. = Thus=A0super columns=A0seem to fit=A0appropriately=A0but I really want to a= void them since they are seriously discouraged by everyone.


On Wed, Nov 16, 2011 at 3:19 AM, Be= n Gambley <ben.gambley@intoscience.com> wrote:
Hi Aditya

Not sure the best way to do in Cassandra but have you considered using
apache solr - you could then include just the row keys pointing back
to Cassandra where the actual data is.

Solr seems quite capable of performing google like searches and is fast.


Cheers
Ben

On 16/11/2011, at 1:50 AM, Aditya Narayan <adynnn@gmail.com> wrote:

> Hi
>
> I need to add 'search users' functionality to my application. = (The trigger for fetching searched items(like google instant search) is mad= e when 3 letters have been typed in).
>
> For this, I make a CF with String type keys. Each such key is made of = first 3 letters of a user's name.
>
> Thus all names starting with 'Mar-' are stored in single row (= with key=3D"Mar").
> The column names are framed as remaining letters of the names. Thus, a= name 'Marcos' will be stored within rowkey "Mar" & c= ol name "cos". The id will be stored as column value. Since there= could be many users with same name. Thus I would have multple userIds(of u= sers named "Marcos") to be stored inside columnname "cos&quo= t; under key "Mar". Thus,
>
> 1. Supercolumn seems to be a better fit for my use case(so that ids of= users with same name may fit as sub-columns inside a super-column) but sin= ce supercolumns are not encouraged thus I want to use an alternative schema= for this usecase if possible. Could you suggest some ideas on this ?
>
> 2. Another thing, I would like to row cache this CF so that when the u= ser types in the next character & the query is made consequently, then = this row be retrieved from the cache without touching DB. It is expected wh= ile searching for a single username, the query(as a part of making instanta= neous suggestions) will be made at least 2-3 times. One may also suggest to= fetch all the columns starting with queried string to be retrieved & t= hen filter out at application level but what about just fecthing the exact = no of columns(ids/names of users) I need to show to the user. Thus instead = of keeping all the hundreds of cols in the application layer what about kee= ping it within the DB cache.!?
> The space alloted for the cache will be very small so that row remains= in cache for a very short time(enough to serve only for the time duration = while user is making a single search!?) ?
>
>

--000e0cdfcbc62600d204b1d27399--