hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juh...@ninja.co.jp>
Subject Re: hbase performance
Date Fri, 02 Apr 2010 10:16:30 GMT
On 04/02/2010 06:09 PM, Chen Bangzhong wrote:
> my switch is Dell 2724.
>
>   
I'm not a network admin, and I don't have the ability to know how
congested your network is from that(nor do I think it is possible since
there's going to be a lot of other factors).

Try running the test on a single machine using the miniCluster flag,
this should eliminate network transfer as an issue. If despite the fact
you're running everything on a single machine you get a high throughput
on that your network is likely the issue. If on the other hand
throughput goes down significantly the problem lies elsewhere.
> --在 2010年4月2日 下午5:04,Chen Bangzhong <bangzhong@gmail.com>写道:
>
>   
>>
>> 在 2010年4月2日 下午4:58,Juhani Connolly <juhani@ninja.co.jp>写道:
>>
>> You're results seem very low, but your system specs are also quite
>>     
>>> moderate.
>>>
>>> On 04/02/2010 04:46 PM, Chen Bangzhong wrote:
>>>       
>>>> Hi, All
>>>>
>>>> I am benchmarking hbase. My HDFS clusters includes 4 servers (Dell 860,
>>>>         
>>> with
>>>       
>>>> 2 GB RAM). One NameNode, one JobTracker, 2 DataNodes.
>>>>
>>>> My HBase Cluster also comprise 4 servers too. One Master, 2 region and
>>>>         
>>> one
>>>       
>>>> ZooKeeper. (Dell 860, with 2 GB RAM)
>>>>
>>>>         
>>> While I'm far from being an authority on the matter, running
>>> datanodes+regionservers together should help performance
>>> Try making your 2 datanodes + 2 regionservers into 4 servers running
>>> both data/region.
>>>
>>>       
>> I will try to run datanode and region server on the same server.
>>
>>
>>     
>>>> I runned the org.apache.hadoop.PerformanceEvaluation on the ZooKeeper
>>>> server. the ROW_LENGTH was changed from 1000 to ROW_LENGTH = 100*1024;
>>>> So each value will be 100k in size.
>>>>
>>>> hadoop version is 0.20.2, hbase version is 0.20.3. dfs.replication set
>>>>         
>>> to 1.
>>>       
>>>>         
>>> Setting replication to 1 isn't going to give results that are very
>>> indicative of a "real" application, making it questionable as a
>>> benchmark. If you intend to run on a single replica at release, you'll
>>> be at high risk of data loss.
>>>
>>>       
>> Since I have only 2 data nodes, I set replication to 1. In production, it
>> will be set to 3.
>>
>>
>>     
>>>> The following is the command line:
>>>>
>>>> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred
>>>> --rows=10000 randomWrite 20.
>>>>
>>>> It tooks about one hour to complete the test(3468628 ms), about 60
>>>>         
>>> writes
>>>       
>>>> per second. It seems the performance is disappointing.
>>>>
>>>> Is there anything I can do to make hbase perform better under 100k size
>>>>         
>>> ?I
>>>       
>>>> didn't try the method mentioned in the performance wiki yet, because I
>>>> thought 60writes/sec is too low.
>>>>
>>>>
>>>>         
>>> Do you mean *over* 100k size?
>>> 2GB ram is pretty low and you'd likely get significantly better
>>> performance with it, though on this scale it probably isn't a
>>> significant problem.
>>>
>>>       
>> the data size is exactly 100k size.
>>
>>
>>     
>>>> If the value size is 1k, hbase performs much better. 200000
>>>>         
>>> sequencewrite
>>>       
>>>> tooks about 16 seconds, about 12500 writes/per second.
>>>>
>>>>
>>>>         
>>> Comparing sequencewrite performance with randomwrite isn't a helpful
>>> indicator. Do you have randomWrite results for 1k values? The way your
>>> performance degrades with the size of the records seems like you may
>>> have a bottleneck at network transfer? What's rack locality like and how
>>> much bandwidth do you have between the servers?
>>>       
>>>> Now I am trying to benchmark using two clients on 2 servers, no result
>>>>         
>>> yet.
>>>       
>>>>
>>>>         
>>>       
>> for 1k datasize, the sequencewrite performance and randomWrite performance
>> is about the same. All my servers are under one switch, don't know the
>> switch bandwidth yet.
>>
>>
>>     
>>>  You're already running 20 clients on your first server with the
>>> PerformanceEvaluation. Do you mean you intend to run 20 on each?
>>>
>>>       
>> In fact, it is 20 threads on one machine.
>>
>>     
>>> Hopefully someone with better knowledge can give a better answer but my
>>> guess is that you have a network transfer transfer. Try doing further
>>> tests with randomWrite and decreasing value sizes and see if the time
>>> correlates to the total amount of data written.
>>>
>>>
>>>       
>>     
>   


Mime
View raw message