incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Design questions/Schema help
Date Mon, 26 Jul 2010 23:46:24 GMT
We are thinking about using Cassandra to store our search logs. Can 
someone point me in the right direction/lend some guidance on design? I 
am new to Cassandra and I am having trouble wrapping my head around some 
of these new concepts. My brain keeps wanting to go back to a RDBMS design.

We will be storing the user query, # of hits returned and their session 
id. We would like to be able to answer the following questions.

- What is the n most popular queries and their counts within the last x 
(mins/hours/days/etc). Basically the most popular searches within a 
given time range.
- What is the most popular query within the last x where hits = 0. Same 
as above but with an extra "where" clause
- For session id x give me all their other queries
- What are all the session ids that searched for 'foos'

We accomplish the above functionality w/ MySQL using 2 tables. One for 
the raw search log information and the other to keep the 
aggregate/running counts of queries.

Would this sort of ad-hoc querying be better implemented using Hadoop + 
Hive? If so, should I be storing all this information in Cassandra then 
using Hadoop to retrieve it?

Thanks for your suggestions

Mime
View raw message