hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Analysing slow HBase mapreduce performance
Date Wed, 17 Mar 2010 04:28:30 GMT
Out of interest... to what did you set it and what was the speed-up like?

J-D

On Tue, Mar 16, 2010 at 9:26 PM, Dmitry Chechik <dmitry@tellapart.com> wrote:
> That did it. Thanks!
>
> On Tue, Mar 16, 2010 at 9:14 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> Did you set scanner caching higher?
>>
>> J-D
>>
>> On Tue, Mar 16, 2010 at 9:10 PM, Dmitry <dmitry@tellapart.com> wrote:
>> > Hi all,
>> >
>> > I'm trying to analyse some issues with HBase performance in a mapreduce.
>> >
>> > I'm running a mapreduce which reads a table and just writes it out to
>> HDFS.
>> > The table is small, roughly ~400M of data and 18M rows.
>> > I've pre-split the table into 32 regions, so that I'm not running into
>> the
>> > problem of having one region server serve the entire table.
>> >
>> > I'm running an HBase cluster with:
>> > - 16 region servers (each on the same machine as a Hadoop tasktracker and
>> > datanode).
>> > - 1 master (on the same machine as the Hadoop jobtracker and namenode.)
>> > - Zookeeper quorum of just 1 machine (on the same machine as the master).
>> >
>> > I have LZO compression enabled for both HBase and Hadoop.
>> >
>> > Running this job takes about 5-6 minutes.
>> >
>> > Running a mapreduce reading the exact same set of data from a
>> SequenceFile
>> > on HDFS takes only about 1 minute.
>> >
>> > What else can I do to try to diagnose this?
>> >
>> > Thanks,
>> >
>> > - Dmitry
>> >
>>
>

Mime
View raw message