hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Bishop <jbishop....@gmail.com>
Subject hbase puts in map tasks don't seem to run in parallel
Date Sun, 03 Jun 2012 01:25:48 GMT

I am new to hadoop and hbase, but have spent the last few weeks learning as
much as I can...

I am attempting to create an hbase table during a hadoop job by simply
doing puts to a table from each map task. I am hoping that each map task
will use the regionserver on its node so that all 10 of my nodes are
putting values into the table at the same time.

Here is my map class below. The Node class is a simple data structure which
knows how to parse a line of input and create a Put for hbase.

When I run this I see that only one region server is active for the table I
am creating. I know that my input file is split among all 10 of my data
nodes, and I know that if I do not do puts to the hbase table everything
runs in a parallel on all 10 machines. It is only when I start doing hbase
puts that the run times go way up.



public static class MapClass extends Mapper<Object, Text, IntWritable,
Node> {
HTableInterface table = null;
protected void setup(Context context) throws IOException,
InterruptedException {
String tableName = context.getConfiguration().get(TABLE);
table = new HTable(tableName);
public void map(Object key, Text value, Context context) throws
IOException, InterruptedException {
Node node = null;
try {
node = Node.parseNode(value.toString());
} catch (ParseException e) {
throw new IOException();
Put put = node.getPut();
protected void cleanup(Context context) throws IOException,
InterruptedException {

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message