hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akmal Abbasov <akmal.abba...@icloud.com>
Subject Re: HBase strange behaviour
Date Tue, 07 Jul 2015 07:58:37 GMT
> These znodes seemed to be related to YARN, not HBase. 
> 
> Maybe ask on yarn user mailing list ?
Right. Thank you.

> On 07 Jul 2015, at 09:50, Ted Yu <yuzhihong@gmail.com> wrote:
> 
> These znodes seemed to be related to YARN, not HBase. 
> 
> Maybe ask on yarn user mailing list ?
> 
> Cheers
> 
> 
> 
> On Jul 7, 2015, at 12:05 AM, Akmal Abbasov <akmal.abbasov@icloud.com> wrote:
> 
>>> Have you run the following command in hbase shell ?
>>> balance_switch true
>> I’ve tried, and this did the trick. Thank you.
>> 
>> One more thing is not clear for me is what I can do with ~4000 znodes in 
>> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot
>> What will happen with them if I’ll do nothing, will the system try to complete
all of these applications?
>> 
>> Thank you.
>> 
>> 
>>> On 07 Jul 2015, at 00:16, Ted Yu <yuzhihong@gmail.com> wrote:
>>> 
>>> Have you run the following command in hbase shell ?
>>> balance_switch true
>>> 
>>> Cheers
>>> 
>>> On Mon, Jul 6, 2015 at 12:16 PM, Akmal Abbasov <akmal.abbasov@icloud.com>
>>> wrote:
>>> 
>>>>> Do you see in the master log something similar to the following ?
>>>>> 
>>>>> master.HMaster: Not running balancer because 1 region(s) in transition
>>>> yes, I have several of them, but all of them were 3 days ago.
>>>> 
>>>> I check the ‘ritCount’ metric, and it is 0, also I checked the
>>>> /hbase/region-in-transition znode, which is also empty.
>>>> But I can’t start balancer manually.
>>>> 
>>>> I took snapshot of tables each our.
>>>> I’ve checked the path
>>>> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot under in zookeeper,
>>>> and there
>>>> are ~4000 applications. It looks that all of them are create snapshot
>>>> operations. Also I’ve observed that the CPU
>>>> usage of the master is much higher that it was in the past.
>>>> Is it possible that all of this applications are causing the problem?
>>>> 
>>>> Can I delete all of this applications?
>>>> 
>>>> 
>>>>> On 06 Jul 2015, at 18:45, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>> 
>>>>> Do you see in the master log something similar to the following ?
>>>>> 
>>>>> master.HMaster: Not running balancer because 1 region(s) in transition
>>>>> 
>>>>> You can search backwards for balancer / assignment related logs.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Mon, Jul 6, 2015 at 8:49 AM, Akmal Abbasov <akmal.abbasov@icloud.com>
>>>>> wrote:
>>>>> 
>>>>>>> What error(s) did you get when trying to restart the region server
?
>>>> Have
>>>>>>> you checked its log files ?
>>>>>> it was a VM, and I was not able to access it any more, I can’t
login to
>>>>>> it. Restarting several times didn’t helped.
>>>>>> 
>>>>>> 
>>>>>>> Can you check master log around this time ? If there was region
in
>>>>>>> transition, balancer wouldn't balance.
>>>>>> I have a lot of this
>>>>>> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>>> util.FSVisitor: No logs under
>>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_18.14/WALs
>>>>>> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>>> util.FSVisitor: No logs under
>>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_19.14/WALs
>>>>>> 2015-07-06 15:15:39,921 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>>> util.FSVisitor: No logs under
>>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_20.13/WALs
>>>>>> 2015-07-06 15:15:39,925 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>>> util.FSVisitor: No logs under
>>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_21.14/WALs
>>>>>> 2015-07-06 15:15:39,926 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>>> util.FSVisitor: No logs under
>>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_22.14/WALs
>>>>>> 2015-07-06 15:15:39,927 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>>> util.FSVisitor: No logs under
>>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_23.14/WALs
>>>>>> 2015-07-06 15:15:39,928 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>>> util.FSVisitor: No logs under
>>>>>> directory:hdfs://test/hbase/.hbase-snapshot/testsnap/WALs
>>>>>> 2015-07-06 15:15:47,324 INFO  [FifoRpcScheduler.handler1-thread-18]
>>>>>> master.HMaster: Client=hadoop//10.32.0.140 set balanceSwitch=false
>>>>>> 2015-07-06 15:23:31,265 DEBUG [master:hbase-m2:60000.oldLogCleaner]
>>>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
>>>>>> hbase-rs1%2C60020%2C1436189457794.1436190023718
>>>>>> 2015-07-06 15:23:31,504 DEBUG [master:hbase-m2:60000.oldLogCleaner]
>>>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
>>>>>> hbase-rs1%2C60020%2C1436189457794.1436193624562
>>>>>> 2015-07-06 15:32:49,382 INFO  [FifoRpcScheduler.handler1-thread-14]
>>>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
>>>>>> 2015-07-06 15:32:56,936 INFO  [FifoRpcScheduler.handler1-thread-1]
>>>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
>>>>>> 
>>>>>> Thank you.
>>>>>> 
>>>>>>> On 06 Jul 2015, at 17:37, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>>>> 
>>>>>>> bq. I had to delete and recreate it
>>>>>>> 
>>>>>>> What error(s) did you get when trying to restart the region server
?
>>>> Have
>>>>>>> you checked its log files ?
>>>>>>> 
>>>>>>> bq. start balancer manually, but it returned false
>>>>>>> 
>>>>>>> Can you check master log around this time ? If there was region
in
>>>>>>> transition, balancer wouldn't balance.
>>>>>>> 
>>>>>>> Cheers
>>>>>>> 
>>>>>>> On Mon, Jul 6, 2015 at 8:29 AM, Akmal Abbasov <
>>>> akmal.abbasov@icloud.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> I have a strange behaviour in my HBase cluster. I have 5
rs and 2
>>>>>> masters.
>>>>>>>> One of the rs stopped working, restart didn’t worked, and
I had to
>>>>>> delete
>>>>>>>> and recreate it.
>>>>>>>> But when this rs have stopped, the cluster also stopped functioning.
>>>>>>>> There were a lot of inconsistencies. When I recreated the
rs with
>>>> disks
>>>>>> of
>>>>>>>> the previous one, cluster started working.
>>>>>>>> But now, only 3 rs host the regions, other 2 have 0 regions.
>>>>>>>> I’ve tried to start balancer manually, but it returned
false?
>>>>>>>> Any idea?
>>>>>>>> 
>>>>>>>> I am using hbase hbase-0.98.7-hadoop2.
>>>>>>>> Thank you.
>>>>>>>> 
>>>>>>>> Kind regards,
>>>>>>>> Akmal Abbasov
>> 


Mime
View raw message