hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shashwat shriparv <dwivedishash...@gmail.com>
Subject Re: Hbase for real-time data aggregation
Date Fri, 06 Jan 2012 19:23:51 GMT
As far as my exp it not bad to go wid hbase. only proble is you will not
get redimade things. if your going wid it you can look for indexing option
available wid hbase. you cn try hsearch and lily project for indexing and
fast retrieval.

On Fri, Jan 6, 2012 at 11:25 PM, prasenjit mukherjee

> I need to design a near real-time system where documents ( with
> fields:id,keywords,timestamp ) are getting added to the system. The
> requirement is to get top-k keywords from the documents added to the
> system in last x minutes. The typical document addition rate is around
> 100 documents/sec, which may increase in the future ( hence technology
> should be horizontally scalable ).
> I am thinking of using hbase. For each document we can add a set of
> keys ( for all the keywords in that doc )  with timestamp_keywords.
> During query time we can run a map-reduce job over a keyrange ( from
> ts1_* to ts2* ) to compute the the keyword frequency for that range.
> Any other better technologies  for this use-case ? Like MomgoDB,
> Cassandra, Storm etc. The use case is primarily on aggregation.
> -prasen

Shashwat Shriparv

<iframe src="
width="728" height="90" scrolling="no" border="0" marginwidth="0"
style="border:none;" frameborder="0"></iframe>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message