incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sasha Dolgy <sdo...@gmail.com>
Subject Re: design that mimics twitter tweet search
Date Sun, 18 Mar 2012 15:16:07 GMT
yes -- but given i have two keywords, and want to find all tweets that have
"cassandra" and "bestest" ... means, retrieving all columns + values in
each row, iterating through both to see if tweet id's in one, exist in the
other and finishing up with a consolidated list of tweet id's that only
exist in both.  just seems clunky to me ... ?

On Sun, Mar 18, 2012 at 4:12 PM, Benoit Perroud <benoit@noisette.ch> wrote:

> The simpliest modeling you could have is using the keyword as key, a
> timestamp/time UUID as column name and the tweetid as value
>
> -> cf['keyword']['timestamp'] = tweetid
>
> then you do a range query to get all tweetid sorted by time (you may
> want them in reverse order) and you can limit to the number of tweets
> displayed on the page.
>
> As some rows can become large, you could use key patitionning by
> concatening for instance keyword and the month and year.
>
>
> 2012/3/18 Sasha Dolgy <sdolgy@gmail.com>:
> > Hi All,
> >
> > With twitter, when I search for words like:  "cassandra is the bestest",
> 4
> > tweets will appear, including one i just did.  My understand that the
> > internals of twitter work in that each word in a tweet is allocated,
> > irrespective of the presence of a  # hash tag, and the tweet id is
> assigned
> > to a row for that word.  What is puzzling to me, and hopeful that some
> smart
> > people on here can shed some light on -- is how would this work with
> > Cassandra?
> >
> > row [ cassandra ]: key -> tweetid  / timestamp
> > row [ bestest ]: key -> tweetid / timestamp
> >
> > I had thought that I could simply pull a list of all column names from
> each
> > row (representing each word) and flag all occurrences (tweet id's) that
> > exist in each row ... however, these rows would get quite long over time.
> >
> > Am I missing an easier way to get a list of all "tweetid's" that exist in
> > multiple rows?
> >
> > --
> > Sasha Dolgy
> > sasha.dolgy@gmail.com
>
>
>
> --
> sent from my Nokia 3210
>



-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Mime
View raw message