incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: full text search
Date Thu, 25 Feb 2010 01:13:51 GMT
Quick question about Facebook's indexing strategy... based on the fact that
all of the columns within a supercolumn must be serialized/deserialized
together, and therefore fit in memory, is there a point at which individual
Facebook users could start causing problems if they have a lot of messages?
 Below is copied from section 6.1 of the lakshman-ladis2009 paper.

"There are two kinds of search features
that are enabled today (a) term search (b) interactions
- given the name of a person return all messages that the
user might have ever sent or received from that person. The
schema consists of two column families. For query (a) the
user id is the key and the words that make up the message
become the super column. Individual message identi ers
of the messages that contain the word become the columns
within the super column. For query (b) again the user id is
the key and the recipients id's are the super columns. For
each of these super columns the individual message identi-
ers are the columns."


If the user sent 10,000 messages to another user over a few years, wouldn't
they have 10,000 message id's in a supercolumn?  I guess that's only about
80kB, but certainly if they weren't partitioning by user, they would run
into problems, so it may not be a good example for large non-partitioned
indexes.  Or maybe i still don't understand how supercolumns work.

Matt


On Wed, Feb 24, 2010 at 7:52 PM, Nathan McCall <nate@vervewireless.com>wrote:

> The following paper on the Articles and Presentations section of the
> Cassandra wiki describes Facebook's inbox search implementation:
> http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
>
> -Nate
>
> On Wed, Feb 24, 2010 at 4:45 PM, Mohammad Abed <mohammad.abed@gmail.com>
> wrote:
> > Either of these solutions used in any production environment?
> >
> >
> >
> > On Wed, Feb 24, 2010 at 3:54 PM, Brandon Williams <driftx@gmail.com>
> wrote:
> >>
> >> On Wed, Feb 24, 2010 at 5:41 PM, Mohammad Abed <mohammad.abed@gmail.com
> >
> >> wrote:
> >>>
> >>> Any suggestions on how to pursue full text search with Cassandra, what
> >>> options are out there?
> >>
> >> Also: http://github.com/tjake/Lucandra
> >> -Brandon
> >
>

Mime
View raw message