incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benoit Perroud <ben...@noisette.ch>
Subject Re: design that mimics twitter tweet search
Date Sun, 18 Mar 2012 15:12:11 GMT
The simpliest modeling you could have is using the keyword as key, a
timestamp/time UUID as column name and the tweetid as value

-> cf['keyword']['timestamp'] = tweetid

then you do a range query to get all tweetid sorted by time (you may
want them in reverse order) and you can limit to the number of tweets
displayed on the page.

As some rows can become large, you could use key patitionning by
concatening for instance keyword and the month and year.


2012/3/18 Sasha Dolgy <sdolgy@gmail.com>:
> Hi All,
>
> With twitter, when I search for words like:  "cassandra is the bestest", 4
> tweets will appear, including one i just did.  My understand that the
> internals of twitter work in that each word in a tweet is allocated,
> irrespective of the presence of a  # hash tag, and the tweet id is assigned
> to a row for that word.  What is puzzling to me, and hopeful that some smart
> people on here can shed some light on -- is how would this work with
> Cassandra?
>
> row [ cassandra ]: key -> tweetid  / timestamp
> row [ bestest ]: key -> tweetid / timestamp
>
> I had thought that I could simply pull a list of all column names from each
> row (representing each word) and flag all occurrences (tweet id's) that
> exist in each row ... however, these rows would get quite long over time.
>
> Am I missing an easier way to get a list of all "tweetid's" that exist in
> multiple rows?
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com



-- 
sent from my Nokia 3210

Mime
View raw message