hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-501) Empty region server address in info:server entry and a startcode of -1 in .META.
Date Sun, 09 Mar 2008 01:03:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576670#action_12576670
] 

Jim Kellerman commented on HBASE-501:
-------------------------------------

The reason for offlining the region when the construction failed was that before, the master
would keep trying to assign the region and it would fail again. The assumption is that the
region is broken somehow. To do something more elegant would require HBase-fsck which I thought
was targeted for 0.2 and not 0.1

Yes it causes ISEs but without knowing more about the failure, it is hard to know where to
look to fix the broken region.

> Empty region server address in info:server entry and a startcode of -1 in .META.
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-501
>                 URL: https://issues.apache.org/jira/browse/HBASE-501
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.1.0, 0.2.0, 0.16.0
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.1.0
>
>         Attachments: master.log, noise-and-logging-0.1-v2.patch, noise-and-logging-and-coarse-fix-0.1-v3.patch,
noise.patch
>
>
> Manufactured a region empty server address and a startcode of -1 when a regionserver
was slow to open a region and the alternative regionserver that had been asked open the region
fails and reports CLOSE to the master.
> Here's long version of story:
> Region is enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==.  Was originally on XX.XX.XX.184:60020
but this node ran out of memory (though it had 2G).
> {code}
> 2008-03-08 00:29:39,472 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020,
call batchUpdate(enwiki_071018,6q_ORe3mPzBTOnenVGS6zk==,1204860472398, 9223372036854775807,
org.apache.hadoop.hbase.io.BatchUpdate@126d2380) from XX.XX.XX.233:54292: error: java.io.IOException:
java.lang.OutOfMemoryError: Java heap space
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
>         at java.lang.Object.clone(Native Method)
>         at java.lang.reflect.Method.getParameterTypes(Unknown Source)
>         at java.lang.Class.searchMethods(Unknown Source)
>         at java.lang.Class.getMethod0(Unknown Source)
>         at java.lang.Class.getMethod(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
> 2008-03-08 00:29:39,472 WARN org.apache.hadoop.ipc.Server: Out of Memory in server select
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.HashMap.newKeyIterator(Unknown Source)
>         at java.util.HashMap$KeySet.iterator(Unknown Source)
>         at java.util.HashSet.iterator(Unknown Source)
>         at sun.nio.ch.SelectorImpl.processDeregisterQueue(Unknown Source)
>         at sun.nio.ch.PollSelectorImpl.doSelect(Unknown Source)
>         at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
>         at sun.nio.ch.SelectorImpl.select(Unknown Source)
>         at sun.nio.ch.SelectorImpl.select(Unknown Source)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:323)
> 2008-03-08 00:31:15,300 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60020,
call batchUpdate(enwiki_080103_meta,,1204867086244, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@2d13981b)
from XX.XX.XX.233:54810: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap
space
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
>         at java.lang.String.<init>(Unknown Source)
>         at java.lang.StringBuilder.toString(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:415)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
> {code}
> Was given to XX.XX.XX.227 at 00:36:20 but this server is crazy replaying a bunch of edits
(Need to stop emitting edits in HStore -- 496 removed outputting skipped edits).  It can't
put the region up immediately.  Takes a long time. 
> Then given to XX.XX.XX.183 at 00:37:26. It fails to open with:
> {code}
> 2008-03-08 00:37:29,827 INFO org.apache.hadoop.hbase.HRegion: compaction completed on
region enwiki_071018,AYtsfKtThdIJkVLUSKipA-==,1204860383810. Took 5sec
> 2008-03-08 00:37:29,943 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region
enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write
to file /hbase/aa0-005-2.u.powerset.com/enwiki_080103/1578810967/page/mapfiles/5679937491167886060/data
by DFSClient_-540201177
>         at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:341)
>         at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>         at java.lang.reflect.Method.invoke(Unknown Source)
> ...
> {code}
> Sends a CLOSE to the master.
> Then 227 says its successfully opened region.
> Master says region server XX.XX.XX.227:60020 should not have opened region enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985
> Now the server field in META is empty.
> {code}
>  59 2008-03-08 00:38:09,167 DEBUG org.apache.hadoop.hbase.HMaster: HMaster.metaScanner
regioninfo: {regionname: enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985, startKey: <CzQ7UPCw-AoIn2JzSEN_pV==>,
endKey: <DUwzKe-niVjzlXs1SvrvVk==>, encodedName: 1578810967, offline: true, tableDesc:
{name: enwiki_080103,         families: {anchor:={name: anchor, max versions: 3, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}, misc:={name: misc, max
versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none},
page:={name: page, max versions: 3, compression: NONE, i        n memory: false, max length:
2147483647, bloom filter: none}, redirect:={name: redirect, max versions: 3, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: , startCode:
-1
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message