hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Repeating log message in a [custom] unit test
Date Tue, 19 Apr 2011 18:57:13 GMT
Some more digging, the reason it stays stuck is that the
DaughterOpener thread uses the region server's CatalogTracker which
has a default timeout of Integer.MAX_VALUE and it was stuck in this
code:

      while(!stopped && !metaAvailable.get() &&
          (timeout == 0 || System.currentTimeMillis() < stop)) {
        if (getMetaServerConnection(true) != null) {
          return metaLocation;
        }
        metaAvailable.wait(timeout == 0 ? 50 : timeout);
      }

I can figure that getMetaServerConnection was called, then it wasn't
able to find .META. and then -ROOT-, so it returned null and started
waiting. Instead we should wait in increments (basically always sleep
a small amount of time, up to the specified timeout). On a future loop
it would have seen that the server was stopped.

J-D

On Tue, Apr 19, 2011 at 11:04 AM, Jean-Daniel Cryans
<jdcryans@apache.org> wrote:
> So you have your special lucene region that's opened on some region
> server and when the master starts shutting down, it doesn't seem to
> see it because while closing regions it says:
>
> 2011-04-18 21:35:09,221 INFO  [IPC Server handler 4 on 32141]
> master.ServerManager(283): Only catalog regions remaining; running
> unassign
>
> But the region is still assigned. I see that first it did:
>
> 2011-04-18 21:35:08,474 INFO
> [RegionServer:0;j-laptop,56437,1303187684214.compactor]
> regionserver.SplitTransaction(207): Starting split of region
> lucene,,1303187697156.d9ccbf93327587883207d3151bd74e76.
>
> and just moments after that:
>
> 2011-04-18 21:35:08,477 INFO  [main] hbase.HBaseTestingUtility(410):
> Shutting down minicluster
>
> and splitting is still going on, it's eventually done when this is printed:
>
> 2011-04-18 21:35:08,621 INFO
> [RegionServer:0;j-laptop,56437,1303187684214.compactor]
> catalog.MetaEditor(85): Offlined parent region
> lucene,,1303187697156.d9ccbf93327587883207d3151bd74e76. in META
>
> Calling a split is async, so your client doesn't wait for that
> operation to end so it goes forward and the test ends (by closing the
> cluster). It seems to be a bug, if any region is split and the parent
> is marked offline when no other region is opened, the master will
> start closing root and meta which screws the opening of the daughters:
>
> 2011-04-18 21:35:09,853 INFO
> [j-laptop,56437,1303187684214-daughterOpener=6ee9617a0f64eeeca10c6807eb807b84]
> catalog.CatalogTracker(441): Failed verification of .META.,,1 at
> address=j-laptop:56437;
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Region is not
> online: .META.,,1
>
> Please open a jira.
>
> J-D
>

Mime
View raw message