hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: All regions stay on two nodes out of 18 nodes
Date Thu, 17 Apr 2014 16:54:37 GMT
The message cited is from OpenRegionHandler
#tryTransitionFromOpeningToFailedOpen()

'version 1' means the OpenRegionHandler instance was expecting version 1 in
corresponding znode.

Cheers


On Wed, Apr 16, 2014 at 10:29 PM, Tao Xiao <xiaotao.cs.nju@gmail.com> wrote:

> BTW, the region server reported:
>
> 2014-04-16 11:30:31,890 INFO  [RS_OPEN_REGION-b05:60020-0]
> handler.OpenRegionHandler: Opening of region {ENCODED =>
> 6886ac98a71a47dc78a9e0ab5b3f07cd, NAME =>
> 'E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cd.',
> STARTKEY => '', ENDKEY => '170000346762_20140315'} failed, transitioning
> from OPENING to FAILED_OPEN in ZK, expecting version 1
>
> Here what does "expecting version 1" indicate?
>
>
> 2014-04-17 13:27 GMT+08:00 Tao Xiao <xiaotao.cs.nju@gmail.com>:
>
> > Take the region
> >
> E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cdfor
> example.
> >
> > I checked the master's log and the region server (*b05.jsepc.com
> > <http://b05.jsepc.com>*) log, and found that in the master log there are
> > just 4 logging lines about that region and the logging time was as early
> as
> > 2014-04-02.
> >
> > In the region server's log, there are more logging lines about that
> > region, but the logging time is quite recent, say 2014-04-16. It seems
> that
> > the master has lost control of that region for a long time, but the
> region
> > server is still managing that region although it cannot open it.
> >
> > The master log is here <http://pastebin.com/6J6v9tSg>, and the region
> > server log is here <http://pastebin.com/fbuu0RpC>.
> >
> >
> > 2014-04-17 9:34 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> >
> > You can pick a region which is stuck in transition, find which region
> >> server is hosting it and search region server log on that server.
> >>
> >> By correlating events from master and region server logs, you should see
> >> what is happening.
> >>
> >>
> >> On Wed, Apr 16, 2014 at 6:24 PM, Tao Xiao <xiaotao.cs.nju@gmail.com>
> >> wrote:
> >>
> >> > Actually, open that link and then click on the picture, it will zoom
> in
> >> and
> >> > become quite clear.
> >> >
> >> > I checked the HMaster UI just now and I am sure that these regions are
> >> > always in transition,  I suppose there would be some exceptions
> >> happening.
> >> > How to prevent regions from being in transition for a long time ?
> >> >
> >> >
> >> > 2014-04-17 9:00 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> >> >
> >> > > The picture is not very clear.
> >> > > I don't see E_MP_DAY_READ having regions in transition.
> >> > >
> >> > > Anyway, as long as there is region in transition, balancer would not
> >> run.
> >> > >
> >> > > Cheers
> >> > >
> >> > >
> >> > > On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xiaotao.cs.nju@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > Ted,
> >> > > >
> >> > > > I can see some regions of other tables in transition now , but
I'm
> >> not
> >> > > sure
> >> > > > how long have them been in transition and I will check the HBase
> >> master
> >> > > UI
> >> > > > later. Here is the
> >> > > > screenshot<
> >> > > >
> >> > >
> >> >
> >>
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> >> > > > >.
> >> > > > From the screenshot, there is a region with state of FAILED_OPEN,
> >> which
> >> > > is
> >> > > > in red, and there are 9 regions in transition for more than 60
> >> seconds.
> >> > > >
> >> > > > Note that the table whose regions all stay in 2 nodes is
> >> E_MP_DAY_READ,
> >> > > > while the other tables shown in the screenshot are named as
> >> > > > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322,
> >> E_MP_DAY_READ_20140324,
> >> > > and
> >> > > > so on.
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > >
> >> > > > 2014-04-16 23:10 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> >> > > >
> >> > > > > bq. found some regions of other tables in transition, not
of
> this
> >> > > table.
> >> > > > >
> >> > > > > That can explain why "balancer" command returned false.
> >> > > > > Are those regions stuck in transition ?
> >> > > > >
> >> > > > > Cheers
> >> > > > >
> >> > > > >
> >> > > > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <
> >> xiaotao.cs.nju@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > The command "balance_switch true" returns true, but
the
> command
> >> > > > > "balancer"
> >> > > > > > returns false. I checked the HMaster UI and found some
regions
> >> of
> >> > > other
> >> > > > > > tables in transition, not of this table.
> >> > > > > >
> >> > > > > > This table's name is E_MP_DAY_READ, I did grep it in
the
> master
> >> log
> >> > > and
> >> > > > > > found only the following lines:
> >> > > > > >
> >> > > > > > 2014-04-15 15:50:59,925 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,926 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,926 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,937 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,938 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,940 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > >
> >> > > > > > so few logging lines about it, looks strange ?
> >> > > > > >
> >> > > > > >
> >> > > > > > BTW, I can spread the regions of this table evenly
across the
> >> whole
> >> > > > > cluster
> >> > > > > > after I shutdown the two region servers where the regions
of
> >> this
> >> > > table
> >> > > > > > resided originally.
> >> > > > > >
> >> > > > > >
> >> > > > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> >> > > > > >
> >> > > > > > > Is load balancer enabled ?
> >> > > > > > >
> >> > > > > > > Can you grep this table in master log and pastebin
what you
> >> > found ?
> >> > > > > > >
> >> > > > > > > Cheers
> >> > > > > > >
> >> > > > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <
> >> xiaotao.cs.nju@gmail.com>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > I am using HDP 2.0.6, which has 18 nodes(region
servers).
> >> One
> >> > of
> >> > > my
> >> > > > > > HBase
> >> > > > > > > > tables has 50 regions but I found that the
50 regions all
> >> stay
> >> > in
> >> > > > > just
> >> > > > > > > two
> >> > > > > > > > nodes, not spread evenly in the 18 nodes.
I did not
> >> pre-create
> >> > > > splits
> >> > > > > > so
> >> > > > > > > > this table was gradually split into 50 regions
itself.
> >> > > > > > > >
> >> > > > > > > > I'd like to know why all the regions stay
in just two
> nodes,
> >> > not
> >> > > > the
> >> > > > > 18
> >> > > > > > > > nodes of the cluster, and how to spread the
regions evenly
> >> > across
> >> > > > all
> >> > > > > > the
> >> > > > > > > > region servers. Thanks.
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message