hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tim robertson <timrobertson...@gmail.com>
Subject optimising loading of tab file
Date Wed, 22 Jul 2009 12:26:19 GMT
Hi all,

I have a 70G sparsely populated tab file (74 columns) to load into 2
column families in a single HBase table.

I am running on my tiny dev cluster (4 mac minis, 4G ram, each running
all Hadoop demons and RegionServers) to just familiarise myself, while
the proper rack is being set up.

I wrote a MapReduce job where I load into HBase during the Map:
  String rowID = UUID.randomUUID().toString();
  Put row = new Put(rowID.getBytes());
  int fields = reader.readAllInto(splits, row);  // uses a properties
file to map tab columns to column families
  context.setStatus("Map updating cell for row[" + rowID+ "] with " +
fields + " fields");
  table.put(row);			

Is this the preferred way to do this kind of loading or is a
TableOutputFormat likely to outperform the Map version?

[Knowing performance estimates are pointless on this cluster - I see
500 records per sec input, which is a bit disappointing.  I have
default Hadoop and HBase config and had to put a ZK quorum on each to
get HBase to start]

Cheers,

Tim

Mime
View raw message