hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: HDFS unbalance issue. (HBase over HDFS)
Date Wed, 25 Mar 2009 17:17:26 GMT
If you load your data on HDFS from node1 then it will always get more blocks 
as hadoop saves one copy to local datanode then copies to others to meet 
replication setting.

Same thing should be happening on hbase where ever the region is open and a 
compaction happens the data should be written local datanode first then 
copied.

As for the datanode always using cpu when ever there is compactions going on 
if your replication is set to 3 then there is 3 datanodes effected by each 
compaction happening.

Billy




"schubert zhang" <zsongbo@gmail.com> wrote in 
message news:fa03480d0903250534u2ba56c2bm94c67d2e92a0e4e4@mail.gmail.com...
> Then, I stop my application (the application write to and read from 
> HBase).
> After one hour, when I come back to see the status of HDFS, some blocks 
> are
> deleted. Following is current status.
>
> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.12:50010 to delete"
> hadoop-schubert-namenode-nd0-rack0-cloud.log
> 2956
> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.14:50010 to delete"
> hadoop-schubert-namenode-nd0-rack0-cloud.log
> 2962
>
> node1: 464518
> node2: 42495
> node3: 7505
> node4: 7205
> node5: 7636
>
> On each node, the datanode process is busy (top).
>
> I want to know the reason of these phenomenons. Thanks.
>
> Schubert
>
> On Wed, Mar 25, 2009 at 6:37 PM, schubert zhang 
> <zsongbo@gmail.com> wrote:
>
>> From another point of view, I think HBase cannot control to delete blocks
>> on which node, it would just delete files, and HDFS delete blocks where 
>> the
>> blocks locating.
>>
>> Schubert
>>
>> On Wed, Mar 25, 2009 at 6:28 PM, schubert zhang 
>> <zsongbo@gmail.com> wrote:
>>
>>> Thanks Ryan. Balancer may take a long time.
>>>
>>> The number of block are too different. But maybe it is caused by HBase 
>>> not
>>> deleting garbage blocks on regionserver1 and regionserver2 and maybe 
>>> others.
>>>
>>> We grep the logs of hadoop and find there is no any "deleting block" in
>>> node1 and node2.
>>>
>>> Following is the grep (grep -c "ask 10.24.1.1?:50010 to delete") result 
>>> of
>>> hasoop logs:
>>>
>>> namenode:
>>>
>>> -----grep -c "ask 10.24.1.12:50010 to delete"-----node1
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.12:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-23
>>> 4754
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.12:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-24
>>> 1062
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.12:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log
>>> 0
>>>
>>> -----grep -c "ask 10.24.1.14:50010 to delete"-----node2
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.14:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log
>>> 1494
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.14:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-23
>>> 3305
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.14:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-24
>>> 3385
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.14:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log
>>> 1494
>>>
>>> -----grep -c "ask 10.24.1.16:50010 to delete"-----node3
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.16:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-23
>>> 8022
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.16:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-24
>>> 8238
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.16:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log
>>> 4302
>>>
>>> -----grep -c "ask 10.24.1.18:50010 to delete"-----node4
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.18:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-23
>>> 8591
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.18:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-24
>>> 9111
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.18:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log
>>> 5038
>>>
>>> -----grep -c "ask 10.24.1.20:50010 to delete"-----node5
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.20:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-23
>>> 3794
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.20:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log.2009-03-24
>>> 3946
>>> [schubert@nd0-rack0-cloud logs]$ grep -c "ask 10.24.1.20:50010 to 
>>> delete"
>>> hadoop-schubert-namenode-nd0-rack0-cloud.log
>>> 2989
>>>
>>> So, I think it may caused by HBase.
>>> I just grep the log of the zero "delete block" node. and find:
>>> [schubert@nd1-rack0-cloud logs]$ grep -c "Deleting block"
>>> hadoop-schubert-datanode-nd1-rack0-cloud.log.2009-03-24
>>> 104739
>>> [schubert@nd1-rack0-cloud logs]$ grep -c "Deleting block"
>>> hadoop-schubert-datanode-nd1-rack0-cloud.log.2009-03-23
>>> 465927
>>> [schubert@nd1-rack0-cloud logs]$ grep -c "Deleting block"
>>> hadoop-schubert-datanode-nd1-rack0-cloud.log
>>> 0
>>>
>>>
>>>
>>>
>>> On Wed, Mar 25, 2009 at 5:14 PM, Ryan Rawson 
>>> <ryanobjc@gmail.com> wrote:
>>>
>>>> Try
>>>> hadoop/bin/start-balancer.sh
>>>>
>>>> HDFS doesnt auto-balance.  Balancing in HDFS requires moving data 
>>>> around,
>>>> whereas balancing in HBase just means opening a file on a different
>>>> machine.
>>>>
>>>> On Wed, Mar 25, 2009 at 2:12 AM, schubert zhang 
>>>> <zsongbo@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi all,
>>>> > I am using hbase-0.19.1 and hadoop-0.19.
>>>> > My cluster have 5+1 nodes, and there are about 512 regions in HBase
>>>> (256MB
>>>> > per region).
>>>> >
>>>> > But I found the blocks in HDFS is very unbalanced. Following is the
>>>> status
>>>> > from HDFS web GUI.
>>>> >
>>>> > (Node: I don't know if this mailing list can display html!)
>>>> >
>>>> > HDFS blocks:
>>>> > node1   509036 blocks
>>>> > node2   157937 blocks
>>>> > node3   15783   blocks
>>>> > node4   15117   blocks
>>>> > node5   20158   blocks
>>>> >
>>>> > But my HBase regions are very balanced.
>>>> > node1   88   regions
>>>> > node2   108 regions
>>>> > node3   111 regions
>>>> > node4   102 regions
>>>> > node5   105 regions
>>>> >
>>>> >
>>>> >
>>>> > NodeLast
>>>> > ContactAdmin StateConfigured
>>>> > Capacity (GB)Used
>>>> > (GB)Non DFS
>>>> > Used (GB)Remaining
>>>> > (GB)Used
>>>> > (%)Used
>>>> > (%)Remaining
>>>> > (%)Blocksnd1-rack0-cloud<
>>>> >
>>>> http://nd1-rack0-cloud:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>>>> > >
>>>> > 0In Service822.8578.6743.28200.8670.3324.41509036nd2-rack0-cloud<
>>>> >
>>>> http://nd2-rack0-cloud:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>>>> > >
>>>> > 0In Service822.8190.0242.96589.8223.0971.68157937nd3-rack0-cloud<
>>>> >
>>>> http://nd3-rack0-cloud:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>>>> > >
>>>> > 0In Service822.851.9542.61728.246.3188.5115783nd4-rack0-cloud<
>>>> >
>>>> http://nd4-rack0-cloud:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>>>> > >
>>>> > 6In Service822.846.1942.84733.775.6189.1815117nd5-rack0-cloud<
>>>> >
>>>> http://nd5-rack0-cloud:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>>>> > >
>>>> > 1In Service1215.6152.3762.911100.324.3190.5220158
>>>> >
>>>> >
>>>> > But my HBase regions are very balanced.
>>>> >
>>>> > AddressStart CodeLoadnd1-rack0-cloud:60020 <
>>>> http://nd1-rack0-cloud:60030/>
>>>> > 1237967027050requests=383, regions=88, usedHeap=978, maxHeap=1991
>>>> > nd2-rack0-cloud:60020 <http://nd2-rack0-cloud:60030/
>>>> > >1237788871362requests=422,
>>>> > regions=108, usedHeap=1433,
>>>> > maxHeap=1991nd3-rack0-cloud:60020<http://nd3-rack0-cloud:60030/>
>>>> > 1237788881667requests=962, regions=111, usedHeap=1534, maxHeap=1991
>>>> > nd4-rack0-cloud:60020 <http://nd4-rack0-cloud:60030/
>>>> > >1237788859541requests=369,
>>>> > regions=102, usedHeap=1059,
>>>> > maxHeap=1991nd5-rack0-cloud:60020<http://nd5-rack0-cloud:60030/>
>>>> > 1237788899331requests=384, regions=105, usedHeap=1535,
>>>> > maxHeap=1983Total:servers:
>>>> > 5 requests=2520, regions=514
>>>> >
>>>>
>>>
>>>
>>
> 



Mime
View raw message