incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey V. Panov" <panov.a...@gmail.com>
Subject Re: design that mimics twitter tweet search
Date Mon, 19 Mar 2012 02:05:31 GMT
Why you suppose they did search on Cassandra?

On 19 March 2012 00:16, Sasha Dolgy <sdolgy@gmail.com> wrote:

> yes -- but given i have two keywords, and want to find all tweets that
> have "cassandra" and "bestest" ... means, retrieving all columns + values
> in each row, iterating through both to see if tweet id's in one, exist in
> the other and finishing up with a consolidated list of tweet id's that only
> exist in both.  just seems clunky to me ... ?
>
>
> On Sun, Mar 18, 2012 at 4:12 PM, Benoit Perroud <benoit@noisette.ch>wrote:
>
>> The simpliest modeling you could have is using the keyword as key, a
>> timestamp/time UUID as column name and the tweetid as value
>>
>> -> cf['keyword']['timestamp'] = tweetid
>>
>> then you do a range query to get all tweetid sorted by time (you may
>> want them in reverse order) and you can limit to the number of tweets
>> displayed on the page.
>>
>> As some rows can become large, you could use key patitionning by
>> concatening for instance keyword and the month and year.
>>
>>
>> 2012/3/18 Sasha Dolgy <sdolgy@gmail.com>:
>> > Hi All,
>> >
>> > With twitter, when I search for words like:  "cassandra is the
>> bestest", 4
>> > tweets will appear, including one i just did.  My understand that the
>> > internals of twitter work in that each word in a tweet is allocated,
>> > irrespective of the presence of a  # hash tag, and the tweet id is
>> assigned
>> > to a row for that word.  What is puzzling to me, and hopeful that some
>> smart
>> > people on here can shed some light on -- is how would this work with
>> > Cassandra?
>> >
>> > row [ cassandra ]: key -> tweetid  / timestamp
>> > row [ bestest ]: key -> tweetid / timestamp
>> >
>> > I had thought that I could simply pull a list of all column names from
>> each
>> > row (representing each word) and flag all occurrences (tweet id's) that
>> > exist in each row ... however, these rows would get quite long over
>> time.
>> >
>> > Am I missing an easier way to get a list of all "tweetid's" that exist
>> in
>> > multiple rows?
>> >
>> > --
>> > Sasha Dolgy
>> > sasha.dolgy@gmail.com
>>
>>
>>
>> --
>> sent from my Nokia 3210
>>
>
>
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
>

Mime
View raw message