hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-501) Empty region server address in info:server entry and a startcode of -1 in .META.
Date Thu, 13 Mar 2008 18:06:24 GMT

     [ https://issues.apache.org/jira/browse/HBASE-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-501:
------------------------

    Attachment: 501-v7.patch

Here is v7.  Cleans up logs and stops our offlining of regions when an IOE on construction
of HRegion.  Adds enhanced logging to help us figure some anomalies such as the 'hung regionserver'.

M  conf/hbase-default.xml
    Add hbase.hbasemaster.maxregionopen property.
M  src/java/org/apache/hadoop/hbase/HStore.java
    Change way we log.  Do way less.  Just emit sums of edits applied
    and skipped rather than individual edits.
M  src/java/org/apache/hadoop/hbase/HRegionServer.java
    Make sleeper instance a local rather than data member.
    (reportForDuty): Take a sleeper instance.
    (run): Removed redundant wrap of a 'for' by a 'while'.
    (constructor): If IOE, do not offline the region.  Seen to be
    an overreaction.
M  src/java/org/apache/hadoop/hbase/HLog.java
    Don't output map of all files being cleaned everytime a new
    entry is added; instead just log new entry.  Remove emission
    of every 10k edits.
M src/java/org/apache/hadoop/hbase/HMaster.java
    Up default for maxregionopen.  Was seeing that playing edits
    could take a long time (mostly because we used log every
    edit) but no harm in this being longer.  On REPORT_CLOSE,
    emit region info, not just region so can see the properties
    (W/o, made it hard to figure who was responsible for offlining).
    Add logging of attempt # in shutdown processing.
    Add logging of state flags passed to the close region.  Helps
    debugging.  Also in close offline ONLY if we are NOT reassigning
    the region (jimk find).
M  src/java/org/apache/hadoop/hbase/util/Sleeper.java
    Add logging of extraordinary sleeps or calculated periods 
    (suspicion is that we're sleeping way longer on loaded machies
    and the regionserver appears hung).

> Empty region server address in info:server entry and a startcode of -1 in .META.
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-501
>                 URL: https://issues.apache.org/jira/browse/HBASE-501
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.1.0, 0.2.0, 0.16.0
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.1.0
>
>         Attachments: 501-v5.patch, 501-v7.patch, master.log, noise-and-logging-0.1-v2.patch,
noise-and-logging-and-coarse-fix-0.1-v3.patch, noise-and-logging-fix-0.1-v4.patch~, noise.patch
>
>
> Manufactured a region empty server address and a startcode of -1 when a regionserver
was slow to open a region and the alternative regionserver that had been asked open the region
fails and reports CLOSE to the master.
> Here's long version of story:
> Region is enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==.  Was originally on XX.XX.XX.184:60020
but this node ran out of memory (though it had 2G).
> {code}
> 2008-03-08 00:29:39,472 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020,
call batchUpdate(enwiki_071018,6q_ORe3mPzBTOnenVGS6zk==,1204860472398, 9223372036854775807,
org.apache.hadoop.hbase.io.BatchUpdate@126d2380) from XX.XX.XX.233:54292: error: java.io.IOException:
java.lang.OutOfMemoryError: Java heap space
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
>         at java.lang.Object.clone(Native Method)
>         at java.lang.reflect.Method.getParameterTypes(Unknown Source)
>         at java.lang.Class.searchMethods(Unknown Source)
>         at java.lang.Class.getMethod0(Unknown Source)
>         at java.lang.Class.getMethod(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
> 2008-03-08 00:29:39,472 WARN org.apache.hadoop.ipc.Server: Out of Memory in server select
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.HashMap.newKeyIterator(Unknown Source)
>         at java.util.HashMap$KeySet.iterator(Unknown Source)
>         at java.util.HashSet.iterator(Unknown Source)
>         at sun.nio.ch.SelectorImpl.processDeregisterQueue(Unknown Source)
>         at sun.nio.ch.PollSelectorImpl.doSelect(Unknown Source)
>         at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
>         at sun.nio.ch.SelectorImpl.select(Unknown Source)
>         at sun.nio.ch.SelectorImpl.select(Unknown Source)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:323)
> 2008-03-08 00:31:15,300 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60020,
call batchUpdate(enwiki_080103_meta,,1204867086244, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@2d13981b)
from XX.XX.XX.233:54810: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap
space
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
>         at java.lang.String.<init>(Unknown Source)
>         at java.lang.StringBuilder.toString(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:415)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
> {code}
> Was given to XX.XX.XX.227 at 00:36:20 but this server is crazy replaying a bunch of edits
(Need to stop emitting edits in HStore -- 496 removed outputting skipped edits).  It can't
put the region up immediately.  Takes a long time. 
> Then given to XX.XX.XX.183 at 00:37:26. It fails to open with:
> {code}
> 2008-03-08 00:37:29,827 INFO org.apache.hadoop.hbase.HRegion: compaction completed on
region enwiki_071018,AYtsfKtThdIJkVLUSKipA-==,1204860383810. Took 5sec
> 2008-03-08 00:37:29,943 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region
enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write
to file /hbase/aa0-005-2.u.powerset.com/enwiki_080103/1578810967/page/mapfiles/5679937491167886060/data
by DFSClient_-540201177
>         at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:341)
>         at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>         at java.lang.reflect.Method.invoke(Unknown Source)
> ...
> {code}
> Sends a CLOSE to the master.
> Then 227 says its successfully opened region.
> Master says region server XX.XX.XX.227:60020 should not have opened region enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985
> Now the server field in META is empty.
> {code}
>  59 2008-03-08 00:38:09,167 DEBUG org.apache.hadoop.hbase.HMaster: HMaster.metaScanner
regioninfo: {regionname: enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985, startKey: <CzQ7UPCw-AoIn2JzSEN_pV==>,
endKey: <DUwzKe-niVjzlXs1SvrvVk==>, encodedName: 1578810967, offline: true, tableDesc:
{name: enwiki_080103,         families: {anchor:={name: anchor, max versions: 3, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}, misc:={name: misc, max
versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none},
page:={name: page, max versions: 3, compression: NONE, i        n memory: false, max length:
2147483647, bloom filter: none}, redirect:={name: redirect, max versions: 3, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: , startCode:
-1
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message