hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <prasen....@gmail.com>
Subject Hbase for real-time data aggregation
Date Fri, 06 Jan 2012 17:55:55 GMT
I need to design a near real-time system where documents ( with
fields:id,keywords,timestamp ) are getting added to the system. The
requirement is to get top-k keywords from the documents added to the
system in last x minutes. The typical document addition rate is around
100 documents/sec, which may increase in the future ( hence technology
should be horizontally scalable ).

I am thinking of using hbase. For each document we can add a set of
keys ( for all the keywords in that doc )  with timestamp_keywords.
During query time we can run a map-reduce job over a keyrange ( from
ts1_* to ts2* ) to compute the the keyword frequency for that range.

Any other better technologies  for this use-case ? Like MomgoDB,
Cassandra, Storm etc. The use case is primarily on aggregation.


View raw message