Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of bangzhong@gmail.com designates
 209.85.223.203 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=kwsM3ohknBak9lRODO8+m16cI8DqsNZYDzbYFpxY9I1tasTouxOiwoI0uf0NvM2BFd
         MtXAY7SXOnBT3acyKRlpzkIHxeAY6YVrNDdoOZyQcEJvx5PyVikhzvs4lui6DYjGbNiT
         6g+1eex2TjhqPOw8kvjceVvporm2o1d7uyVGg=
MIME-Version: 1.0
In-Reply-To: <i2p9037c8b31004020204i46e4b178zc0f5f8b9b63b68ae@mail.gmail.com>
References: <n2u9037c8b31004020046mdc836ba5y49e7d87be34c303d@mail.gmail.com>
	 <4BB5B1BE.7090106@ninja.co.jp>
	 <i2p9037c8b31004020204i46e4b178zc0f5f8b9b63b68ae@mail.gmail.com>
Date: Fri, 2 Apr 2010 17:09:10 +0800
Message-ID: <j2w9037c8b31004020209p8c12ed85p913a285900971601@mail.gmail.com>
Subject: Re: hbase performance
From: Chen Bangzhong <bangzhong@gmail.com>
To: hbase-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016364edf44123d0504833d55af

--0016364edf44123d0504833d55af
Content-Type: text/plain; charset=GB2312
Content-Transfer-Encoding: quoted-printable

my switch is Dell 2724.

=D4=DA 2010=C4=EA4=D4=C22=C8=D5 =CF=C2=CE=E75:04=A3=ACChen Bangzhong <bangz=
hong@gmail.com>=D0=B4=B5=C0=A3=BA

>
>
> =D4=DA 2010=C4=EA4=D4=C22=C8=D5 =CF=C2=CE=E74:58=A3=ACJuhani Connolly <ju=
hani@ninja.co.jp>=D0=B4=B5=C0=A3=BA
>
> You're results seem very low, but your system specs are also quite
>> moderate.
>>
>> On 04/02/2010 04:46 PM, Chen Bangzhong wrote:
>> > Hi, All
>> >
>> > I am benchmarking hbase. My HDFS clusters includes 4 servers (Dell 860=
,
>> with
>> > 2 GB RAM). One NameNode, one JobTracker, 2 DataNodes.
>> >
>> > My HBase Cluster also comprise 4 servers too. One Master, 2 region and
>> one
>> > ZooKeeper. (Dell 860, with 2 GB RAM)
>> >
>> While I'm far from being an authority on the matter, running
>> datanodes+regionservers together should help performance
>> Try making your 2 datanodes + 2 regionservers into 4 servers running
>> both data/region.
>>
>
> I will try to run datanode and region server on the same server.
>
>
>> > I runned the org.apache.hadoop.PerformanceEvaluation on the ZooKeeper
>> > server. the ROW_LENGTH was changed from 1000 to ROW_LENGTH =3D 100*102=
4;
>> > So each value will be 100k in size.
>> >
>> > hadoop version is 0.20.2, hbase version is 0.20.3. dfs.replication set
>> to 1.
>> >
>> Setting replication to 1 isn't going to give results that are very
>> indicative of a "real" application, making it questionable as a
>> benchmark. If you intend to run on a single replica at release, you'll
>> be at high risk of data loss.
>>
>
> Since I have only 2 data nodes, I set replication to 1. In production, it
> will be set to 3.
>
>
>> > The following is the command line:
>> >
>> > bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred
>> > --rows=3D10000 randomWrite 20.
>> >
>> > It tooks about one hour to complete the test(3468628 ms), about 60
>> writes
>> > per second. It seems the performance is disappointing.
>> >
>> > Is there anything I can do to make hbase perform better under 100k siz=
e
>> =A3=BFI
>> > didn't try the method mentioned in the performance wiki yet, because I
>> > thought 60writes/sec is too low.
>> >
>> >
>> Do you mean *over* 100k size?
>> 2GB ram is pretty low and you'd likely get significantly better
>> performance with it, though on this scale it probably isn't a
>> significant problem.
>>
>
> the data size is exactly 100k size.
>
>
>> > If the value size is 1k, hbase performs much better. 200000
>> sequencewrite
>> > tooks about 16 seconds, about 12500 writes/per second.
>> >
>> >
>> Comparing sequencewrite performance with randomwrite isn't a helpful
>> indicator. Do you have randomWrite results for 1k values? The way your
>> performance degrades with the size of the records seems like you may
>> have a bottleneck at network transfer? What's rack locality like and how
>> much bandwidth do you have between the servers?
>> > Now I am trying to benchmark using two clients on 2 servers, no result
>> yet.
>> >
>> >
>>
>
> for 1k datasize, the sequencewrite performance and randomWrite performanc=
e
> is about the same. All my servers are under one switch, don't know the
> switch bandwidth yet.
>
>
>>  You're already running 20 clients on your first server with the
>> PerformanceEvaluation. Do you mean you intend to run 20 on each?
>>
>
> In fact, it is 20 threads on one machine.
>
>>
>> Hopefully someone with better knowledge can give a better answer but my
>> guess is that you have a network transfer transfer. Try doing further
>> tests with randomWrite and decreasing value sizes and see if the time
>> correlates to the total amount of data written.
>>
>>
>

--0016364edf44123d0504833d55af--