hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-501) Empty region server address in info:server entry and a startcode of -1 in .META.
Date Sat, 08 Mar 2008 23:29:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576654#action_12576654
] 

stack commented on HBASE-501:
-----------------------------

Ok.  Figured it out.  The exception on the regionserver side of the attached log that provokes
the CLOSE is a HDFS error. Here it is:

{code}
2008-03-08 00:37:28,096 DEBUG org.apache.hadoop.hbase.HStore: Applying edit <D496DdeK5YmdoXDJYKeCLF==/page:url/1204889856000=(page:url/1204889856000/http://en.wikipedia.org/wiki/Caesar_Takeshi)>
2008-03-08 00:37:28,096 DEBUG org.apache.hadoop.hbase.HStore: flushing reconstructionCache
2008-03-08 00:37:29,752 DEBUG org.apache.hadoop.hbase.HStore: moving 67943686/page/7182609840669926539
in hdfs://coral-dfs.cluster.powerset.com:10000/hbase/aa0-005-2.u.powerset.com/enwiki_071018/compaction.dir
to 67943686/page/4327824140768830867 in hdfs://coral-dfs.cluster.powerset.com:10000/hbase/aa0-005-2.u.powerset.com/enwiki_071018
for 67943686/page
2008-03-08 00:37:29,827 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region
enwiki_071018,AYtsfKtThdIJkVLUSKipA-==,1204860383810. Took 5sec
2008-03-08 00:37:29,943 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region
enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write to file
/hbase/aa0-005-2.u.powerset.com/enwiki_080103/1578810967/page/mapfiles/5679937491167886060/data
by DFSClient_-540201177
        at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:341)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)

        at org.apache.hadoop.ipc.Client.call(Client.java:512)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
        at org.apache.hadoop.dfs.$Proxy1.complete(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy1.complete(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:2292)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:51)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:67)
        at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:932)
        at org.apache.hadoop.io.MapFile$Writer.close(MapFile.java:172)
        at org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:1103)
        at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:826)
        at org.apache.hadoop.hbase.HStore.<init>(HStore.java:720)
        at org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:289)
        at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1154)
        at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1100)
        at java.lang.Thread.run(Unknown Source)
{code}

We're doing the reconstruction log replay.

An exception in the HRegion constructor at this stage falls in here over on the HRS side:

{code}
Index: HRegionServer.java
===================================================================
--- HRegionServer.java  (revision 634789)
+++ HRegionServer.java  (working copy)
@@ -1168,12 +1168,9 @@
       } catch (IOException e) {
         LOG.error("error opening region " + regionInfo.getRegionName(), e);
         // Mark the region offline.
         // TODO: add an extra field in HRegionInfo to indicate that there is
         // an error. We can't do that now because that would be an incompatible
         // change that would require a migration        
         regionInfo.setOffline(true);
         reportClose(regionInfo);
         return;
       }
{code}

We're setting the HRegionInfo instance offline flag to true.  We send that over to the HMaster.
  It, when processing the MSG_REPORT_CLOSE, if the passed HRegionInfo has an offline flag
set, intentionally sets the do-not-reassign flag:

{code}
          boolean reassignRegion = !region.isOffline();
{code}

So, edits or compaction messes up on HRS side, it breaks our table.  Any ongoing uploads break
to this region with ISEs.

> Empty region server address in info:server entry and a startcode of -1 in .META.
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-501
>                 URL: https://issues.apache.org/jira/browse/HBASE-501
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.1.0, 0.2.0, 0.16.0
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.1.0
>
>         Attachments: master.log, noise-and-logging-0.1-v2.patch, noise.patch
>
>
> Manufactured a region empty server address and a startcode of -1 when a regionserver
was slow to open a region and the alternative regionserver that had been asked open the region
fails and reports CLOSE to the master.
> Here's long version of story:
> Region is enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==.  Was originally on XX.XX.XX.184:60020
but this node ran out of memory (though it had 2G).
> {code}
> 2008-03-08 00:29:39,472 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020,
call batchUpdate(enwiki_071018,6q_ORe3mPzBTOnenVGS6zk==,1204860472398, 9223372036854775807,
org.apache.hadoop.hbase.io.BatchUpdate@126d2380) from XX.XX.XX.233:54292: error: java.io.IOException:
java.lang.OutOfMemoryError: Java heap space
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
>         at java.lang.Object.clone(Native Method)
>         at java.lang.reflect.Method.getParameterTypes(Unknown Source)
>         at java.lang.Class.searchMethods(Unknown Source)
>         at java.lang.Class.getMethod0(Unknown Source)
>         at java.lang.Class.getMethod(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
> 2008-03-08 00:29:39,472 WARN org.apache.hadoop.ipc.Server: Out of Memory in server select
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.HashMap.newKeyIterator(Unknown Source)
>         at java.util.HashMap$KeySet.iterator(Unknown Source)
>         at java.util.HashSet.iterator(Unknown Source)
>         at sun.nio.ch.SelectorImpl.processDeregisterQueue(Unknown Source)
>         at sun.nio.ch.PollSelectorImpl.doSelect(Unknown Source)
>         at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
>         at sun.nio.ch.SelectorImpl.select(Unknown Source)
>         at sun.nio.ch.SelectorImpl.select(Unknown Source)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:323)
> 2008-03-08 00:31:15,300 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60020,
call batchUpdate(enwiki_080103_meta,,1204867086244, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@2d13981b)
from XX.XX.XX.233:54810: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap
space
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
>         at java.lang.String.<init>(Unknown Source)
>         at java.lang.StringBuilder.toString(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:415)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
> {code}
> Was given to XX.XX.XX.227 at 00:36:20 but this server is crazy replaying a bunch of edits
(Need to stop emitting edits in HStore -- 496 removed outputting skipped edits).  It can't
put the region up immediately.  Takes a long time. 
> Then given to XX.XX.XX.183 at 00:37:26. It fails to open with:
> {code}
> 2008-03-08 00:37:29,827 INFO org.apache.hadoop.hbase.HRegion: compaction completed on
region enwiki_071018,AYtsfKtThdIJkVLUSKipA-==,1204860383810. Took 5sec
> 2008-03-08 00:37:29,943 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region
enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write
to file /hbase/aa0-005-2.u.powerset.com/enwiki_080103/1578810967/page/mapfiles/5679937491167886060/data
by DFSClient_-540201177
>         at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:341)
>         at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>         at java.lang.reflect.Method.invoke(Unknown Source)
> ...
> {code}
> Sends a CLOSE to the master.
> Then 227 says its successfully opened region.
> Master says region server XX.XX.XX.227:60020 should not have opened region enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985
> Now the server field in META is empty.
> {code}
>  59 2008-03-08 00:38:09,167 DEBUG org.apache.hadoop.hbase.HMaster: HMaster.metaScanner
regioninfo: {regionname: enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985, startKey: <CzQ7UPCw-AoIn2JzSEN_pV==>,
endKey: <DUwzKe-niVjzlXs1SvrvVk==>, encodedName: 1578810967, offline: true, tableDesc:
{name: enwiki_080103,         families: {anchor:={name: anchor, max versions: 3, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}, misc:={name: misc, max
versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none},
page:={name: page, max versions: 3, compression: NONE, i        n memory: false, max length:
2147483647, bloom filter: none}, redirect:={name: redirect, max versions: 3, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: , startCode:
-1
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message