hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry <dmi...@tellapart.com>
Subject Analysing slow HBase mapreduce performance
Date Wed, 17 Mar 2010 04:10:59 GMT
Hi all,

I'm trying to analyse some issues with HBase performance in a mapreduce.

I'm running a mapreduce which reads a table and just writes it out to HDFS.
The table is small, roughly ~400M of data and 18M rows.
I've pre-split the table into 32 regions, so that I'm not running into the
problem of having one region server serve the entire table.

I'm running an HBase cluster with:
- 16 region servers (each on the same machine as a Hadoop tasktracker and
- 1 master (on the same machine as the Hadoop jobtracker and namenode.)
- Zookeeper quorum of just 1 machine (on the same machine as the master).

I have LZO compression enabled for both HBase and Hadoop.

Running this job takes about 5-6 minutes.

Running a mapreduce reading the exact same set of data from a SequenceFile
on HDFS takes only about 1 minute.

What else can I do to try to diagnose this?


- Dmitry

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message