hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Analysing slow HBase mapreduce performance
Date Wed, 17 Mar 2010 04:14:06 GMT
Did you set scanner caching higher?

J-D

On Tue, Mar 16, 2010 at 9:10 PM, Dmitry <dmitry@tellapart.com> wrote:
> Hi all,
>
> I'm trying to analyse some issues with HBase performance in a mapreduce.
>
> I'm running a mapreduce which reads a table and just writes it out to HDFS.
> The table is small, roughly ~400M of data and 18M rows.
> I've pre-split the table into 32 regions, so that I'm not running into the
> problem of having one region server serve the entire table.
>
> I'm running an HBase cluster with:
> - 16 region servers (each on the same machine as a Hadoop tasktracker and
> datanode).
> - 1 master (on the same machine as the Hadoop jobtracker and namenode.)
> - Zookeeper quorum of just 1 machine (on the same machine as the master).
>
> I have LZO compression enabled for both HBase and Hadoop.
>
> Running this job takes about 5-6 minutes.
>
> Running a mapreduce reading the exact same set of data from a SequenceFile
> on HDFS takes only about 1 minute.
>
> What else can I do to try to diagnose this?
>
> Thanks,
>
> - Dmitry
>

Mime
View raw message