cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Zlatanov <...@lifelogs.com>
Subject Re: full text search
Date Thu, 25 Feb 2010 14:13:55 GMT
On Wed, 24 Feb 2010 15:41:07 -0800 Mohammad Abed <mohammad.abed@gmail.com> wrote: 

MA> Any suggestions on how to pursue full text search with Cassandra, what
MA> options are out there?

I've proposed a bitmask patch
(https://issues.apache.org/jira/browse/CASSANDRA-764) which would help
if your search word set is finite and can be expressed in a bitmask (as
part of the SliceRange).  100,000 tags for instance can be expressed
uncompressed in 100,000 bits which is just a 12.5KB query.  If you use
inversion lists to compress the search string further you can make the
query really tiny, a few bytes per specified tag for even huge search
spaces, but my patch doesn't do that yet.  If this is a viable option
for you, vote for the issue.

I'm using a similar approach on the client side, filtering the results
after I get them, but my search space is IP addresses and I'm filtering
on netmasks so the netmask itself is the bitmask filter.

Expressing the full text search as a supercolumn bitmasked index really
depends on your search domain.  See for instance
http://en.wikipedia.org/wiki/Knowledge_organization but the best place
to start cataloguing general information is to ask a librarian (I say
this having worked at a large search engine that employed dozens of them
to classify the web).

Ted


Mime
View raw message