cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Schleck <>
Subject Re: Data Model Index Text
Date Sat, 09 Jan 2010 00:39:49 GMT
I think I am reading this right, basically you want to query for a
word and find all of the documents that contain it? While there may be
a better way to do this, the way the people at Facebook do it is with
supercolumns. Inside the supercolumn column family they have columns
for every word, such as "Michael" and "Jordan", and within each of
those columns they have keys that correspond to the ids of all of the

I suppose if you do it this way you're forced to figure out which
documents are contained in all of the sets in memory, but if it's good
enough for Facebook I suppose it can't be too bad.

This video talks about it briefly:


On Fri, Jan 8, 2010 at 14:12, ML_Seda <> wrote:
> Hey,
> I've been reading up on the Cassandra data model a bit, and would like to
> get some input from this forum on different techniques for a particular
> problem.
> Assume I need to index millions of text docs (e.g. research papers), and
> allow the ability to query them by a given word inside or around any of the
> indexed docs.  meaning if i search for terms i would like to get a list of
> docs in which these terms show up (e.g. Michael Jordan = Michael is the main
> term, and Jordan is next term n1.  The same can be applied by indicating
> previous terms to Michael)
> How do I model this in Cassandra?
> Would my Keys be a concat of the middle term + docid?  Will I be able to do
> queries by wildcarding the docid?
> Thanks.
> --
> View this message in context:
> Sent from the mailing list archive at

View raw message