hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wellington Chevreuil <wellington.chevre...@gmail.com>
Subject Re: Ghost Regions Problem
Date Wed, 05 Aug 2020 08:36:56 GMT
I mentioned assigns as a possible solution for your original issue (before
you dropped/recreated/bulkloaded original table). It obviously will never
work for these "ghost" regions because these don't belong to any table.

Yes, rolling restart masters will make it read state from meta again. Can
you confirm how have you originally cleaned up the original problem,
especially if you had manually deleted regions from meta?

On Wed, 5 Aug 2020, 08:43 jackie macmillian, <jackie.macmillian@gmail.com>
wrote:

> Thanks for your response Wellington.
>
> hbck2 assigns method does not work here unfortunately due to lack of table
> descriptor, both in meta table and in-memory. The actual table and most of
> the regions that table had been dropped successfully. When you try to
> assign those remaining ghost regions, they are stuck as on HBASE-22780.
>
> One way to get rid of those regions is to create a new table with its old
> name. Suppose you have 4 ghost regions. If you create a 1-region table,
> those 4 ghosts go after that 1 region composing a 5-regioned table. After
> that you are able to disable that table and drop the table successfully. On
> the contrary, as we have many tables & regions like this, it is so hard to
> explore them.
>
> To cut a long story short, the hmaster is assuming its in-memory
> representation of the meta table is intact, but in fact it is not. I need a
> way to force all masters to rebuild their in-memory representations from
> clean hbase:meta table. Does a rolling restart of all masters do that or do
> I have to shut all masters down to force them to proceed with
> initialization on startup?
>
> Wellington Chevreuil <wellington.chevreuil@gmail.com>, 4 Ağu 2020 Sal,
> 16:42 tarihinde şunu yazdı:
>
> > >
> > >  if you use hbck2 to bypass
> > > those locks but leave them as they are, it would be only a cosmetic
> move,
> > > regions won't become online in real
> > >
> > You can use hbck2 *assigns *method to bring those regions online (it
> > accepts multiple regions as input)
> >
> >  i've read that master processes have some in-memory representation
> > > of hbase:meta table
> > >
> > Yes, masters read meta table only during initialisation, from there
> > onwards, since every change to meta is orchestrated by the active master,
> > it assumes its in-memory representation of meta table is the truth. What
> > exactly steps had you followed when you say you had dropped those ghost
> > regions? If that means any manual deletion of region dirs/files in hdfs,
> or
> > direct manipulation of meta table via client API, then that explains the
> > master inconsistency.
> >
> >
> > Em ter., 4 de ago. de 2020 às 12:51, jackie macmillian <
> > jackie.macmillian@gmail.com> escreveu:
> >
> > > Hi all,
> > >
> > > we have a cluster with hbase 2.2.0 installed on hadoop 2.9.2.
> > > a few weeks ago, we had some issues on our active/standby namenode
> > > selection due to some network problems and their zkfc services'
> > competition
> > > to select the active namenode. as a result, both our namenodes became
> > > active for a short time and all region server services restarted
> > > themselves. we achieved to solve that issue with some arrangements on
> > > timeout parameters. but the story began afterwards.
> > > after the region servers completed their reset tasks, we saw that all
> our
> > > hbase tables became unstable. for example, think about a 200
> regions-wide
> > > table. 196 regions of that table got online, but 4 regions stuck at an
> > > intermediate state like closing/opening. at the end, the tables stuck
> at
> > > disabling/enabling states. on the other hand, hbase had lots of
> procedure
> > > locks and masterprocwals directory kept enlarging.
> > > to overcome that issue, i used hbck2 to release stuck regions and once
> i
> > > managed to enable the table, i created an empty copy of that table from
> > its
> > > descriptor and bulk loaded all hfiles of that corrupt table to the new
> > one.
> > > at this point, you would ask why i did not use that enabled table. i
> > > couldn't because although i was able to bypass the locked procedures
> > there
> > > were so many of them to resolve one by one. if you use hbck2 to bypass
> > > those locks but leave them as they are, it would be only a cosmetic
> move,
> > > regions won't become online in real. so i thought it would be much more
> > > faster to create a brand new one and load all the data to that table.
> > bulk
> > > load was successful and the new table became online and scannable. the
> > next
> > > point was to disable the old one and drop it. but, as hmaster was
> dealing
> > > lots of locks and procedures, i wasn't able to disable the old table.
> > some
> > > regions remain in disabling state again. so i decided to set that
> table's
> > > state to disabled with hbck2 and then i succeeded to drop them.
> > > after i put all my tables to online and all my old tables dropped
> > > successfully, masterprocwals was the last stop to a clean hbase, i
> > thought
> > > :) i moved aside masterprocwals directory and restarted the active
> > master.
> > > the new master took control and voila! master procedures & locks became
> > > clear, and all my tables were online as needed! i scanned hbase:meta
> > table
> > > and saw there is no other regions than the ones online.
> > > until now.. remember those regions who were stuck and forced to close
> to
> > > disable and drop the tables? when a region server is crashed and
> > restarted
> > > for some reason now, those regions are tried to be assigned by the
> master
> > > to region servers. but region servers decline that assignment as there
> is
> > > no table descriptor for those regions. take a look at HBASE-22780
> > > <https://issues.apache.org/jira/browse/HBASE-22780>. exactly the same
> > > problem is issued here.
> > > i tried to create a 1-regioned table with the same name as the old
> table.
> > > it succeeded. and the ghost region followed that table. then disabled
> and
> > > dropped them again successfully. and again explored that hbase:meta
> > doesn't
> > > have that region anymore. but after a region server crash it comes
> again
> > > from nowhere. so i figured out that when a region server comes down
> > hmaster
> > > does not read hbase:meta table to assign that server's regions to other
> > > servers. i've read that master processes have some in-memory
> > representation
> > > of hbase:meta table in order to perform assignment issues as fast as
> > > possible. i would clean hbase:meta from those ghost regions as
> explained,
> > > but i have to force the masters to get this clean copy of hbase:meta to
> > > their in-memory representations. how can i achieve that? assume that i
> > have
> > > cleared meta table and now what? rolling restart of hmasters? do
> standby
> > > masters share the same in-memory meta table with the active one? if
> > that's
> > > the case i think rolling restart wouldn't solve that problem.. or
> should
> > i
> > > shut all masters down and then start them again in order to force them
> to
> > > rebuild their in-memories from meta table?
> > > any helps would be appreciated.
> > > thank you for your patience :)
> > >
> > > jackie
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message