Hi Ben,

Solr, as I understood is for implementing full text search capability within documents, but in my case, as of now I just need to implement search on user names which seems to be easily provided by Cassandra as user names (as column names) may be sorted alphabetically within rows. I am splitting these rows by the first three characters of the username. Thus all user names starting with 'Mar' are stored in a row with key 'Mar'.  Column values store the userId of that user.

So Cassandra seems to fully satisfy my needs for this. Only issue i m having is how to deal with multiple users of same name. Thus super columns seem to fit appropriately but I really want to avoid them since they are seriously discouraged by everyone.

On Wed, Nov 16, 2011 at 3:19 AM, Ben Gambley <ben.gambley@intoscience.com> wrote:
Hi Aditya

Not sure the best way to do in Cassandra but have you considered using
apache solr - you could then include just the row keys pointing back
to Cassandra where the actual data is.

Solr seems quite capable of performing google like searches and is fast.


On 16/11/2011, at 1:50 AM, Aditya Narayan <adynnn@gmail.com> wrote:

> Hi
> I need to add 'search users' functionality to my application. (The trigger for fetching searched items(like google instant search) is made when 3 letters have been typed in).
> For this, I make a CF with String type keys. Each such key is made of first 3 letters of a user's name.
> Thus all names starting with 'Mar-' are stored in single row (with key="Mar").
> The column names are framed as remaining letters of the names. Thus, a name 'Marcos' will be stored within rowkey "Mar" & col name "cos". The id will be stored as column value. Since there could be many users with same name. Thus I would have multple userIds(of users named "Marcos") to be stored inside columnname "cos" under key "Mar". Thus,
> 1. Supercolumn seems to be a better fit for my use case(so that ids of users with same name may fit as sub-columns inside a super-column) but since supercolumns are not encouraged thus I want to use an alternative schema for this usecase if possible. Could you suggest some ideas on this ?
> 2. Another thing, I would like to row cache this CF so that when the user types in the next character & the query is made consequently, then this row be retrieved from the cache without touching DB. It is expected while searching for a single username, the query(as a part of making instantaneous suggestions) will be made at least 2-3 times. One may also suggest to fetch all the columns starting with queried string to be retrieved & then filter out at application level but what about just fecthing the exact no of columns(ids/names of users) I need to show to the user. Thus instead of keeping all the hundreds of cols in the application layer what about keeping it within the DB cache.!?
> The space alloted for the cache will be very small so that row remains in cache for a very short time(enough to serve only for the time duration while user is making a single search!?) ?