hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From imbmay <br...@media6degrees.com>
Subject Table Updates with Map/Reduce
Date Fri, 18 Jul 2008 20:41:47 GMT

I want to use hbase to maintain a very large dataset which needs to be
updated pretty much continuously.  I'm creating a record for each entity and
including a creation timestamp column as well as between 10 and 1000
additional columns named for distinct events related to the record entity. 
Being new to hbase the approach I've taken is to create a map/reduce app
that for each input record:

Does a lookup in the table using HTable get(row, column) on the timestamp
colum to determine if there is an existing row for the entity.
If there is no existing record for the entity, the event history for the
entity is added to the table with one column added per unique event id.
If there is an existing record for the entity, it just adds the most recent
event to the table.

I'd like feedback as to whether this is a reasonable approach in terms of
general performance and reliability or if there is a different pattern
better suited to hbase with map/reduce or if I should even be using
map/reduce for this.

Thanks in advance. 

View this message in context: http://www.nabble.com/Table-Updates-with-Map-Reduce-tp18537368p18537368.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message