Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of rajkumar.w93@gmail.com
 designates 209.85.214.44 as permitted sender)
MIME-Version: 1.0
Sender: rajkumar.w93@gmail.com
In-Reply-To: <1729105504170368002@unknownmsgid>
References: 
 <CANGD+ioHEskxEC0WiUb1OXatBJGvn0kLWktKusZ1ysSwi0P_kQ@mail.gmail.com>
	<1729105504170368002@unknownmsgid>
Date: Wed, 16 Nov 2011 09:52:19 +0530
Message-ID: 
 <CANGD+io+VGNwi=0fUBG2y1pb+-+uozhTE7CN9VMh6Tkf7+jVcA@mail.gmail.com>
Subject: Re: Seeking advice on Schema and Caching
From: Aditya Narayan <adynnn@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=000e0cdfcbc62600d204b1d27399

--000e0cdfcbc62600d204b1d27399
Content-Type: text/plain; charset=ISO-8859-1

Hi Ben,

Solr, as I understood is for implementing full text
search capability within documents, but in my case, as of now I just need
to implement search on user names which seems to be easily provided
by Cassandra as user names (as column names) may be sorted alphabetically
within rows. I am splitting these rows by the first three characters of the
username. Thus all user names starting with 'Mar' are stored in a row with
key 'Mar'.  Column values store the userId of that user.

So Cassandra seems to fully satisfy my needs for this. Only issue i m
having is how to deal with multiple users of same name. Thus super
columns seem to fit appropriately but I really want to avoid them since
they are seriously discouraged by everyone.


On Wed, Nov 16, 2011 at 3:19 AM, Ben Gambley <ben.gambley@intoscience.com>wrote:

> Hi Aditya
>
> Not sure the best way to do in Cassandra but have you considered using
> apache solr - you could then include just the row keys pointing back
> to Cassandra where the actual data is.
>
> Solr seems quite capable of performing google like searches and is fast.
>
>
>
> Cheers
> Ben
>
> On 16/11/2011, at 1:50 AM, Aditya Narayan <adynnn@gmail.com> wrote:
>
> > Hi
> >
> > I need to add 'search users' functionality to my application. (The
> trigger for fetching searched items(like google instant search) is made
> when 3 letters have been typed in).
> >
> > For this, I make a CF with String type keys. Each such key is made of
> first 3 letters of a user's name.
> >
> > Thus all names starting with 'Mar-' are stored in single row (with
> key="Mar").
> > The column names are framed as remaining letters of the names. Thus, a
> name 'Marcos' will be stored within rowkey "Mar" & col name "cos". The id
> will be stored as column value. Since there could be many users with same
> name. Thus I would have multple userIds(of users named "Marcos") to be
> stored inside columnname "cos" under key "Mar". Thus,
> >
> > 1. Supercolumn seems to be a better fit for my use case(so that ids of
> users with same name may fit as sub-columns inside a super-column) but
> since supercolumns are not encouraged thus I want to use an alternative
> schema for this usecase if possible. Could you suggest some ideas on this ?
> >
> > 2. Another thing, I would like to row cache this CF so that when the
> user types in the next character & the query is made consequently, then
> this row be retrieved from the cache without touching DB. It is expected
> while searching for a single username, the query(as a part of making
> instantaneous suggestions) will be made at least 2-3 times. One may also
> suggest to fetch all the columns starting with queried string to be
> retrieved & then filter out at application level but what about just
> fecthing the exact no of columns(ids/names of users) I need to show to the
> user. Thus instead of keeping all the hundreds of cols in the application
> layer what about keeping it within the DB cache.!?
> > The space alloted for the cache will be very small so that row remains
> in cache for a very short time(enough to serve only for the time duration
> while user is making a single search!?) ?
> >
> >
>

--000e0cdfcbc62600d204b1d27399
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Ben,<div><br></div><div>Solr, as I understood is for implementing full t=
ext search=A0capability=A0within documents, but in my case, as of now I jus=
t need to implement search on=A0user names=A0which seems to be easily provi=
ded by=A0Cassandra as user names (as column names) may be sorted alphabetic=
ally within rows. I am splitting these rows by the first three characters o=
f the username. Thus all user names starting with &#39;Mar&#39; are stored =
in a row with key &#39;Mar&#39;. =A0Column values store the userId of that =
user.</div>
<div><br></div><div>So=A0Cassandra=A0seems to fully satisfy my needs for th=
is. Only issue i m having is how to deal with multiple users of same name. =
Thus=A0super columns=A0seem to fit=A0appropriately=A0but I really want to a=
void them since they are seriously discouraged by everyone.</div>
<div><br><br><div class=3D"gmail_quote">On Wed, Nov 16, 2011 at 3:19 AM, Be=
n Gambley <span dir=3D"ltr">&lt;<a href=3D"mailto:ben.gambley@intoscience.c=
om">ben.gambley@intoscience.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex;">
Hi Aditya<br>
<br>
Not sure the best way to do in Cassandra but have you considered using<br>
apache solr - you could then include just the row keys pointing back<br>
to Cassandra where the actual data is.<br>
<br>
Solr seems quite capable of performing google like searches and is fast.<br=
>
<br>
<br>
<br>
Cheers<br>
<font color=3D"#888888">Ben<br>
</font><div><div></div><div class=3D"h5"><br>
On 16/11/2011, at 1:50 AM, Aditya Narayan &lt;<a href=3D"mailto:adynnn@gmai=
l.com">adynnn@gmail.com</a>&gt; wrote:<br>
<br>
&gt; Hi<br>
&gt;<br>
&gt; I need to add &#39;search users&#39; functionality to my application. =
(The trigger for fetching searched items(like google instant search) is mad=
e when 3 letters have been typed in).<br>
&gt;<br>
&gt; For this, I make a CF with String type keys. Each such key is made of =
first 3 letters of a user&#39;s name.<br>
&gt;<br>
&gt; Thus all names starting with &#39;Mar-&#39; are stored in single row (=
with key=3D&quot;Mar&quot;).<br>
&gt; The column names are framed as remaining letters of the names. Thus, a=
 name &#39;Marcos&#39; will be stored within rowkey &quot;Mar&quot; &amp; c=
ol name &quot;cos&quot;. The id will be stored as column value. Since there=
 could be many users with same name. Thus I would have multple userIds(of u=
sers named &quot;Marcos&quot;) to be stored inside columnname &quot;cos&quo=
t; under key &quot;Mar&quot;. Thus,<br>

&gt;<br>
&gt; 1. Supercolumn seems to be a better fit for my use case(so that ids of=
 users with same name may fit as sub-columns inside a super-column) but sin=
ce supercolumns are not encouraged thus I want to use an alternative schema=
 for this usecase if possible. Could you suggest some ideas on this ?<br>

&gt;<br>
&gt; 2. Another thing, I would like to row cache this CF so that when the u=
ser types in the next character &amp; the query is made consequently, then =
this row be retrieved from the cache without touching DB. It is expected wh=
ile searching for a single username, the query(as a part of making instanta=
neous suggestions) will be made at least 2-3 times. One may also suggest to=
 fetch all the columns starting with queried string to be retrieved &amp; t=
hen filter out at application level but what about just fecthing the exact =
no of columns(ids/names of users) I need to show to the user. Thus instead =
of keeping all the hundreds of cols in the application layer what about kee=
ping it within the DB cache.!?<br>

&gt; The space alloted for the cache will be very small so that row remains=
 in cache for a very short time(enough to serve only for the time duration =
while user is making a single search!?) ?<br>
&gt;<br>
&gt;<br>
</div></div></blockquote></div><br></div>

--000e0cdfcbc62600d204b1d27399--