hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wellington Chevreuil <wellington.chevre...@gmail.com>
Subject Re: Ghost Regions Problem
Date Tue, 04 Aug 2020 13:41:18 GMT
>
>  if you use hbck2 to bypass
> those locks but leave them as they are, it would be only a cosmetic move,
> regions won't become online in real
>
You can use hbck2 *assigns *method to bring those regions online (it
accepts multiple regions as input)

 i've read that master processes have some in-memory representation
> of hbase:meta table
>
Yes, masters read meta table only during initialisation, from there
onwards, since every change to meta is orchestrated by the active master,
it assumes its in-memory representation of meta table is the truth. What
exactly steps had you followed when you say you had dropped those ghost
regions? If that means any manual deletion of region dirs/files in hdfs, or
direct manipulation of meta table via client API, then that explains the
master inconsistency.


Em ter., 4 de ago. de 2020 às 12:51, jackie macmillian <
jackie.macmillian@gmail.com> escreveu:

> Hi all,
>
> we have a cluster with hbase 2.2.0 installed on hadoop 2.9.2.
> a few weeks ago, we had some issues on our active/standby namenode
> selection due to some network problems and their zkfc services' competition
> to select the active namenode. as a result, both our namenodes became
> active for a short time and all region server services restarted
> themselves. we achieved to solve that issue with some arrangements on
> timeout parameters. but the story began afterwards.
> after the region servers completed their reset tasks, we saw that all our
> hbase tables became unstable. for example, think about a 200 regions-wide
> table. 196 regions of that table got online, but 4 regions stuck at an
> intermediate state like closing/opening. at the end, the tables stuck at
> disabling/enabling states. on the other hand, hbase had lots of procedure
> locks and masterprocwals directory kept enlarging.
> to overcome that issue, i used hbck2 to release stuck regions and once i
> managed to enable the table, i created an empty copy of that table from its
> descriptor and bulk loaded all hfiles of that corrupt table to the new one.
> at this point, you would ask why i did not use that enabled table. i
> couldn't because although i was able to bypass the locked procedures there
> were so many of them to resolve one by one. if you use hbck2 to bypass
> those locks but leave them as they are, it would be only a cosmetic move,
> regions won't become online in real. so i thought it would be much more
> faster to create a brand new one and load all the data to that table. bulk
> load was successful and the new table became online and scannable. the next
> point was to disable the old one and drop it. but, as hmaster was dealing
> lots of locks and procedures, i wasn't able to disable the old table. some
> regions remain in disabling state again. so i decided to set that table's
> state to disabled with hbck2 and then i succeeded to drop them.
> after i put all my tables to online and all my old tables dropped
> successfully, masterprocwals was the last stop to a clean hbase, i thought
> :) i moved aside masterprocwals directory and restarted the active master.
> the new master took control and voila! master procedures & locks became
> clear, and all my tables were online as needed! i scanned hbase:meta table
> and saw there is no other regions than the ones online.
> until now.. remember those regions who were stuck and forced to close to
> disable and drop the tables? when a region server is crashed and restarted
> for some reason now, those regions are tried to be assigned by the master
> to region servers. but region servers decline that assignment as there is
> no table descriptor for those regions. take a look at HBASE-22780
> <https://issues.apache.org/jira/browse/HBASE-22780>. exactly the same
> problem is issued here.
> i tried to create a 1-regioned table with the same name as the old table.
> it succeeded. and the ghost region followed that table. then disabled and
> dropped them again successfully. and again explored that hbase:meta doesn't
> have that region anymore. but after a region server crash it comes again
> from nowhere. so i figured out that when a region server comes down hmaster
> does not read hbase:meta table to assign that server's regions to other
> servers. i've read that master processes have some in-memory representation
> of hbase:meta table in order to perform assignment issues as fast as
> possible. i would clean hbase:meta from those ghost regions as explained,
> but i have to force the masters to get this clean copy of hbase:meta to
> their in-memory representations. how can i achieve that? assume that i have
> cleared meta table and now what? rolling restart of hmasters? do standby
> masters share the same in-memory meta table with the active one? if that's
> the case i think rolling restart wouldn't solve that problem.. or should i
> shut all masters down and then start them again in order to force them to
> rebuild their in-memories from meta table?
> any helps would be appreciated.
> thank you for your patience :)
>
> jackie
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message