incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olexiy Prokhorenko <ole...@prokhorenko.us>
Subject Best practices to build app with querying/searching functionality
Date Mon, 12 Apr 2010 22:34:53 GMT
Hello,

Asked this question on Stack Oveflow
(http://stackoverflow.com/questions/2619744/searches-and-general-querying-with-hbase-and-or-cassandra-best-practices)
but didn't get much of answers. May be some Cassandra people can help
me and point to the right direction? So:

I have User model object with quite few fields (properties, if you
wish) in it. Say "firstname", "lastname", "city" and "year-of-birth".
Each user also gets "unique id".

I want to be able to search by them. How do I do that properly? How to
do that at all?

My understanding (will work for pretty much any key-value storage --
first goes key, then value)

u:123456789 = serialized_json_object

("u" as a simple prefix for user's keys, 123456789 is "unique id").

Now, thinking that I want to be able to search by firstname and
lastname, I can save in:

f:Steve = u:384734807,u:2398248764,u:23276263 f:Alex = u:12324355,u:121324334

so key is "f" - which is prefix for firstnames, and "Steve" is actual
firstname. For "u:Steve" we save as value all user id's who are
"Steve's".

That makes every search very-very easy. Querying by few fields
(properties) -- say by firstname (i.e. "Steve") and lastname (i.e.
"l:Anything") is still easy - first get list of user ids from
"f:Steve", then list from "l:Anything", find crossing user ids, an
here you go.

Problems (and there are quite a few):

1. Saving, updating, deleting user is a pain. It has to be atomic and
consistent operation. Also, if we have size of value limited to some
value - then we are in (potential) trouble. And really not of an
answer here. Only zipping the list of user ids? Not too cool, though.

2. What if we want to add new field to search by. Eventually. Say by
"city". We certainly can do the same way "c:Los Angeles" = ...,
"c:Chicago" = ..., but if we didn't foresee all those "search choices"
from the very beginning, then we will have to be able to create some
night job or something to go by all existing User records and update
those "c:CITY" for them... Quite a big job!

3. Problems with locking. User "u:123" updates his name "Alex", and
user "u:456" updates his name "Alex". They both have to update
"f:Alex" with their id's. That means either we get into overwriting
problem, or one update will wait for another (and imaging if there are
many of them?!).

What's the best way of doing that? Keeping in mind that I want to
search by many fields?

Thanks!

Mime
View raw message