hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/MapReduce" by stack
Date Thu, 03 Jul 2008 19:42:44 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/MapReduce

The comment on the change is:
Updated link so points at latest version

------------------------------------------------------------------------------
  Reading from hbase, the !TableInputFormat asks hbase for the list of regions and makes a
map-per-region.  Writing, it may make sense to avoid the reduce step and write back into hbase
from inside your map.  You'd do this when your job does not need the sort and collation that
MR does inside in its reduce; on insert, hbase sorts so no point double-sorting (and shuffling
data around your MR cluster) unless you need to.  If you do not need the reduce, you might
just have your map emit counts of records processed just so the framework can emit that nice
report of records processed when the job is done.  If running the reduce step makes sense
in  your case, its better to have lots of reducers so load is spread across the hbase cluster.
  
  = Sample MR+HBase Jobs =
- A [http://www.nabble.com/Re%3A-Map-Reduce-over-HBase---sample-code-p18126819.html students/classes
example] by Naama Kraus.
+ A [http://www.nabble.com/Re%3A-Map-Reduce-over-HBase---sample-code-p18253120.html students/classes
example] by Naama Kraus.
  
  == Sample MR Bulk Uploader ==
  Read the class comment below for specification of inputs, prerequisites, etc.

Mime
View raw message