hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Chechik <dmi...@tellapart.com>
Subject Re: Analysing slow HBase mapreduce performance
Date Wed, 17 Mar 2010 04:26:51 GMT
That did it. Thanks!

On Tue, Mar 16, 2010 at 9:14 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Did you set scanner caching higher?
>
> J-D
>
> On Tue, Mar 16, 2010 at 9:10 PM, Dmitry <dmitry@tellapart.com> wrote:
> > Hi all,
> >
> > I'm trying to analyse some issues with HBase performance in a mapreduce.
> >
> > I'm running a mapreduce which reads a table and just writes it out to
> HDFS.
> > The table is small, roughly ~400M of data and 18M rows.
> > I've pre-split the table into 32 regions, so that I'm not running into
> the
> > problem of having one region server serve the entire table.
> >
> > I'm running an HBase cluster with:
> > - 16 region servers (each on the same machine as a Hadoop tasktracker and
> > datanode).
> > - 1 master (on the same machine as the Hadoop jobtracker and namenode.)
> > - Zookeeper quorum of just 1 machine (on the same machine as the master).
> >
> > I have LZO compression enabled for both HBase and Hadoop.
> >
> > Running this job takes about 5-6 minutes.
> >
> > Running a mapreduce reading the exact same set of data from a
> SequenceFile
> > on HDFS takes only about 1 minute.
> >
> > What else can I do to try to diagnose this?
> >
> > Thanks,
> >
> > - Dmitry
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message