hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: long pauses during hbase.PerformanceEvaluation tests
Date Thu, 29 Oct 2009 16:32:32 GMT
14 minutes seems way too much, anything relevant in the region server
logs around the same time? (The one who holds the region at that
point)

J-D

On Thu, Oct 29, 2009 at 1:11 AM, Sujee Maniyam <sujee@sujee.net> wrote:
> forgot to add that I am running
>    hbase v0.20.1
>    hadoop v0.20.1
> pretty much at default settings...
>
> On Thu, Oct 29, 2009 at 12:01 AM, Sujee Maniyam <sujee@sujee.net> wrote:
>> Hi all
>>
>> I have been running 'bin/hbase
>> org.apache.hadoop.hbase.PerformanceEvaluation ' script to get an idea
>> of the cluster.  I see long PAUSES during writes.
>>
>> here is my setup:
>> - 5 nodes on EC2 (m1.large) :  1 hbase master + 4 regions
>> - hadoop & hbase get 2G heap
>>
>> - I don't see any JVM heap getting full (not even close)
>> - during the pauses, the CPU usage is pretty much zero (on client,
>> running on master machine & region servers)
>> - no swap going on
>> - network connections between machines are pretty fast
>>
>> I am curious to know what is causing the delay.  I don't see errors in logs.
>> Could region-splits be causing such long delays?  Because re-runs go
>> through much faster.
>>
>> Any pointers/explanations much appreciated.
>>
>> Here is a couple of examples:
>>
>> A)  SequentialWrite : TestTable created
>> notice the large gaps in timestamps (*)
>> # bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
>>        ==> 201 writes / sec
>>
>>
>>    09/10/28 23:17:55 INFO hbase.PerformanceEvaluation: Table {NAME =>
>>    'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
>>    => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
'false',
>>    BLOCKCACHE => 'true'}]} created
>>    09/10/28 23:17:55 INFO hbase.PerformanceEvaluation: Start sequentialWrite
>>    at offset 0 for 1048576 rows
>>    09/10/28 23:18:10 INFO hbase.PerformanceEvaluation: 0/104857/1048576
>>    09/10/28 23:18:24 INFO hbase.PerformanceEvaluation: 0/209714/1048576
>>
>>  *  09/10/28 23:52:16 INFO hbase.PerformanceEvaluation: 0/314571/1048576
>>    09/10/29 00:25:50 INFO hbase.PerformanceEvaluation: 0/419428/1048576
>>    09/10/29 00:26:06 INFO hbase.PerformanceEvaluation: 0/524285/1048576
>>    09/10/29 00:26:25 INFO hbase.PerformanceEvaluation: 0/629142/1048576
>>
>>  *  09/10/29 00:41:47 INFO hbase.PerformanceEvaluation: 0/733999/1048576
>>    09/10/29 00:43:34 INFO hbase.PerformanceEvaluation: 0/838856/1048576
>>    09/10/29 00:43:49 INFO hbase.PerformanceEvaluation: 0/943713/1048576
>>    09/10/29 00:44:43 INFO hbase.PerformanceEvaluation: 0/1048570/1048576
>>    09/10/29 00:44:44 INFO hbase.PerformanceEvaluation: Finished
>>    sequentialWrite in 5207981ms at offset 0 for 1048576 rows
>>
>>
>> During one of the pauses, here is what I see on one region server.
>> Notice how the table split took 14 mins
>>
>> @ hadoop4
>>    2009-10-29 00:26:33,925 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    Starting split of region TestTable,0000088768,1256773950625
>>    2009-10-29 00:41:30,428 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    Closed TestTable,0000088768,1256773950625
>>    2009-10-29 00:41:32,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    region TestTable,0000088768,1256775993930/1603364931 available; sequence id
>>    is 58208257
>>    2009-10-29 00:41:32,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    Closed TestTable,0000088768,1256775993930
>>    2009-10-29 00:41:32,407 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    region TestTable,0000249152,1256775993930/1798535948 available; sequence id
>>    is 58208258
>>    2009-10-29 00:41:32,407 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    Closed TestTable,0000249152,1256775993930
>>    2009-10-29 00:41:32,422 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>    client connection,
>>    connectString=domU-12-31-39-06-B9-61.compute-1.internal:2181,domU-12-31-39-06-E0-62.compute-1.internal:2181,domU-12-31-39-07-8A-22.compute-1.internal:2181,domU-12-31-39-06-B0-32.compute-1.internal:2181,domU-12-31-39-06-3D-A1.compute-1.internal:2181
>>    sessionTimeout=60000
>>    watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher@1f5b63822009-10-29
>>    00:41:32,422 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to
>>    server domU-12-31-39-06-B0-32.compute-1.internal/10.208.183.192:2181
>>    2009-10-29 00:41:32,466 INFO org.apache.zookeeper.ClientCnxn: Priming
>>    connection to java.nio.channels.SocketChannel[connected
>>    local=/10.208.190.143:53364
>>    remote=domU-12-31-39-06-B0-32.compute-1.internal/10.208.183.192:2181]2009-10-29
>>    00:41:32,467 INFO org.apache.zookeeper.ClientCnxn: Server connection
>>    successful
>>    2009-10-29 00:41:32,528 INFO
>>    org.apache.hadoop.hbase.regionserver.CompactSplitThread: region split, META
>>    updated, and report to master all successful. Old region=REGION => {NAME
=>
>>    'TestTable,0000088768,1256773950625', STARTKE
>>    Y => '0000088768', ENDKEY => '', ENCODED => 1389834479, OFFLINE =>
true,
>>    SPLIT => true, TABLE => {{NAME => 'TestTable', FAMILIES => [{NAME
=>
>>    'info', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647',
>>    BLOCKSIZE => '6
>>    5536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, new regions:
>>    TestTable,0000088768,1256775993930, TestTable,0000249152,1256775993930.
>>    Split took 14mins, 58sec
>>    2009-10-29 00:41:34,379 INFO
>>    org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>    TestTable,00002 49152,1256775993930
>>    2009-10-29 00:41:34,380 INFO
>>    org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>>    MSG_REGION_OPEN: TestTable,0000249152,1256775993930
>>
>>
>>
>> B) 'TestTable' exists.  When I re-ran the test,
>>
>>  ==> 3898 writes / sec
>>
>>    09/10/29 04:28:22 INFO hbase.PerformanceEvaluation: Finished
>>    sequentialWrite in 269966ms at offset 0 for 1048576 rows
>>
>>
>> C) For kickers, here is a RandomWrite
>> (Table created)
>>
>> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation  randomWrite 1
>>
>>      ==> 304 writes /sec
>>
>>    09/10/29 05:33:46 INFO hbase.PerformanceEvaluation: Table {NAME =>
>>    'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
>>    => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
'false',
>>    BLOCKCACHE => 'true'}]} created
>>    09/10/29 05:33:46 INFO hbase.PerformanceEvaluation: Start randomWrite at
>>    offset 0 for 1048576 rows
>>    09/10/29 05:34:01 INFO hbase.PerformanceEvaluation: 0/104857/1048576
>>    09/10/29 05:34:15 INFO hbase.PerformanceEvaluation: 0/209714/1048576
>>    09/10/29 05:35:16 INFO hbase.PerformanceEvaluation: 0/314571/1048576
>>    09/10/29 05:36:25 INFO hbase.PerformanceEvaluation: 0/419428/1048576
>>    09/10/29 05:36:38 INFO hbase.PerformanceEvaluation: 0/524285/1048576
>>    09/10/29 05:37:25 INFO hbase.PerformanceEvaluation: 0/629142/1048576
>>    09/10/29 05:37:37 INFO hbase.PerformanceEvaluation: 0/733999/1048576
>>    09/10/29 05:37:51 INFO hbase.PerformanceEvaluation: 0/838856/1048576
>>  * 09/10/29 05:56:47 INFO hbase.PerformanceEvaluation: 0/943713/1048576
>>  * 09/10/29 06:31:12 INFO hbase.PerformanceEvaluation: 0/1048570/1048576
>>
>> 09/10/29 06:31:13 INFO hbase.PerformanceEvaluation: Finished randomWrite in
>> 3446240ms at offset 0 for 1048576 rows
>>
>>
>> D) randomWrite rerun, with existing TestTable
>> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation  randomWrite 1
>>
>>        ==> 3075 writes / sec
>> 09/10/29 06:57:04 INFO hbase.PerformanceEvaluation: Finished
>> randomWrite in 341746ms at offset 0 for 1048576 rows
>>
>>
>> Thanks
>> Sujee
>>
>> --
>> http://sujee.net
>>
>
>
>
> --
> http://sujee.net
>

Mime
View raw message