hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bijieshan <bijies...@huawei.com>
Subject A sudden msg of "java.io.IOException: Server not running, aborting"
Date Tue, 31 May 2011 11:41:33 GMT
It occurred in an RegionServer with an un-known reason. I have check this RegionServer logs,
 there's no prev-aborting,  and no other infos showed the RegionServer has aborted. So I saw
the following msg showed in a sudden.

>>[logs]
2011-05-25 09:15:44,232 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running
rollback of failed split of ufdr,0065286138106876#4228000,1306260358978.37875b35a870957da534ad29fd2944d5.;
java.io.IOException: Server not running, aborting
	at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2352)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1653)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

>>[description]
Before that time, everything seemed ok(it received so many open-region requests, and handled
them), except several Exception happened during the time.

>>[logs]
2011-05-25 09:15:36,214 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received
request to open region: ufdr,110018,1305617626846.f22eafaf06c3be7f7e5a447cec40c7fa.
2011-05-25 09:15:36,214 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received
request to open region: ufdr,111336,1305617636659.b47f1cf09b10aecaee6805a985646b30.
2011-05-25 09:15:36,215 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received
request to open region: ufdr,280884,1305617952747.7cac8ccc9c9770ef0e48807e07bc8bec.
2011-05-25 09:15:36,215 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received
request to open region: ufdr,280446,1305617949439.b332cfff565d77b2e2d4f1a2c52494b0.
2011-05-25 09:15:36,216 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received
request to open region: ufdr,101974,1305617623149.b60c71565f2c73b81477fe25ba80378b.
2011-05-25 09:15:36,413 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321
Attempting to transition node 11dc72d94c7a5a3d19b0c0c3c49624a5 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-05-25 09:15:36,423 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321
Successfully transitioned node 11dc72d94c7a5a3d19b0c0c3c49624a5 from RS_ZK_REGION_OPENING
to RS_ZK_REGION_OPENING
2011-05-25 09:15:37,645 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321
Attempting to transition node c4c657deda864fa2018b38388ee928e6 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-05-25 09:15:37,653 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321
Attempting to transition node d2546cde1d6546133af74f25252b652c from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-05-25 09:15:37,660 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321
Successfully transitioned node c4c657deda864fa2018b38388ee928e6 from RS_ZK_REGION_OPENING
to RS_ZK_REGION_OPENING
2011-05-25 09:15:37,662 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321
Attempting to transition node bd07db88de68c1022c5bba9e5471c9f9 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-05-25 09:15:37,668 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321
Successfully transitioned node d2546cde1d6546133af74f25252b652c from RS_ZK_REGION_OPENING
to RS_ZK_REGION_OPENING

>>[description]
>From the code, I saw that , only the Server has aborted, the msg of "Server not running,
aborting" could be shown. It determined by the param of HRegionServer#abortRequested, if it
was true, and if somewhere called the method of HRegionServer#checkOpen, this msg would been
printed.

  /**
   * Called to verify that this server is up and running.
   *
   * @throws IOException
   */
  protected void checkOpen() throws IOException {
    if (this.stopped || this.abortRequested) {
      throw new IOException("Server not running"
          + (this.abortRequested ? ", aborting" : ""));
    }
    if (!fsOk) {
      throw new IOException("File system not available");
    }
  }

For I said there's no aborted info been found in Logs:

public void abort(String reason, Throwable cause) {
    if (cause != null) {
      LOG.fatal("ABORTING region server " + this + ": " + reason, cause);
    } else {
      LOG.fatal("ABORTING region server " + this + ": " + reason);
    }
    this.abortRequested = true;
    this.reservedSpace.clear();
    if (this.metrics != null) {
      LOG.info("Dump of metrics: " + this.metrics);
    }
    stop(reason);
  }

For there's no prev-aborting found in the logs, how did the sudden msg of " java.io.IOException:
Server not running, aborting " occurred?
Thanks!

Jieshan Bean.



 

Mime
View raw message