hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bijieshan <bijies...@huawei.com>
Subject Re: A sudden msg of "java.io.IOException: Server not running, aborting"
Date Sun, 12 Jun 2011 09:03:44 GMT
Thanks J-D.
I have got the reasons.
The original .META. location is 100-9. So after verify .META. region location and failed,
the location will be reset.

But while verifying, a Exception occurred:

100-9 was happening to OOM, so the calling of verifyRegionLocaton was blocked. After some
time, the thread was interrupted. And a ClosedByInterruptedException occurred.

So the OOM is the main cause of the problem. 

Jieshan Bean


Ah so 100-13 was the .META. holder but got a socket timeout trying to
talk to 100-9 which was the previous .META. holder.

It means that when it goes to do a put at 09:15:44,232 it must be
contacting that same region server since the location wasn't updated
(the first call failed on socket timeout). Can you check what's going
on with 100-9 during that time and if it's really shutting down?


On Tue, Jun 7, 2011 at 9:23 AM, bijieshan <bijieshan@huawei.com> wrote:
> Thanks J-D.
> Sorry for a long time break of the reply!
> I check the logs of the .META. regionserver, it's indeed the problem like you described.
> But I found another problem.
> The .META. Region has changed it's address, but last for a long time, CatalogTracker
still cache the old address.
> So while another regionserver(not the .META. regionserver) split region, it will send
IPC request to put, this will execute in the old regionserver.
>>>From HMaster, we can see that at 09:02:34, the region has been opened in 100-13:
> 2011-05-25 08:37:03,908 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region .META.,,1.1028785192 on 157-5-100-9,20020,1306257984044
> 2011-05-25 09:02:34,334 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region .META.,,1.1028785192 on 157-5-100-13,20020,1306266506022
> 2011-05-25 09:15:57,649 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification
of .META.,,1 at address=157-5-100-9:20020; java.io.EOFException
> 2011-05-25 09:15:57,649 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Current
cached META location is not valid, resetting
>>>From RegionServer 100-13 and at 09:15:44,  we can see the .META. address cached
in CatalogTracker was still 100-9
> 2011-05-25 09:15:44,232 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Running of failed split of ufdr,0065286138106876#4228000,1306260358978.37875b35a870957da534ad29fd2944d5.;
java.io.IOException: Server not running, aborting
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2352)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1653)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> 2011-05-25 09:15:44,232 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Exception running postOpenDeployTasks; region=11dc72d94c7a5a3d19b0c0c3c49624a5
> java.io.IOException: Call to 157-5-100-9/ failed on local exception:
>        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:806)
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>        at $Proxy8.getRegionInfo(Unknown Source)
>        at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424)
>        at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:272)
>        at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331)
>        at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:364)
>        at org.apache.hadoop.hbase.catalog.MetaEditor.updateRegionLocation(MetaEditor.java:142)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1354)
>        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:215)
> Caused by: java.nio.channels.ClosedByInterruptException
>        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
>        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
>        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143)
>        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:518)
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751)
>        ... 9 more
> Is it a bug here? I think the cached info should be invalid and reset after the .META.
address has been changed immediately , but it's not.
> Thanks!
> Jieshan Bean
View raw message