hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-501) Empty region server address in info:server entry and a startcode of -1 in .META.
Date Sat, 08 Mar 2008 22:29:46 GMT

     [ https://issues.apache.org/jira/browse/HBASE-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-501:
------------------------

    Attachment: noise-and-logging-0.1-v2.patch

hbase-27 is just about some of the cells in meta being empty, usually the regioninfo, server,
and startcode leaving split cell vestiges.  Nothings 'broken'. 

This is about a scenario that manufactures an offlined region in an online table.   The region
has a regioninfo in META but its offlined, there is no startcode or server (shows as -1 and
empty in master log respectively) and we never get around to reassigning this region; it just
sits there in the middle of the table breaking it.

When we have one of these, we get ISAs in client -- seen by a few folks, Lars for one -- and
the offlined regions have also been noticed by others.

Looks like its the CLOSE that offlines the region.  Should it?  I suppose thats fair but we
don't put it back on to the unassigned list.  We don't see the 'reassign region' log message?
 Its as though the region was on the kill or delete list when CLOSE is running but I can't
see how it got there.

Adding new noise patch.  Adds in logging around CLOSE.

This is sample of 'noise' patch cleans up in HLog (Log just keeps outputting this map everytime
new element is added):

2008-03-08 00:35:21,886 DEBUG org.apache.hadoop.hbase.HLog: Creating new log file writer for
path hdfs://coral-dfs.cluster.powerset.com:10000/hbase/aa0-005-2.u.powerset.com/enwiki_080103/2055991669/oldlogfile.log;
map content {enwiki_080103_meta,dy9fcHV_BBQzozASgqoQdk==,1204867224716=org.apache.hadoop.io.SequenceFile$Writer@5e956133,
enwiki_071018,65DdQqrq_BtbmFFhMTmyqF==,1199838155829=org.apache.hadoop.io.SequenceFile$Writer@66e9a2c4,
enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985=org.apache.hadoop.io.SequenceFile$Writer@1c0fdfec,
enwiki_071018,2ZRmAYSs97F__sLy5-ADMV==,1199837444576=org.apache.hadoop.io.SequenceFile$Writer@3456dafb,
enwiki_080103,,1204865363666=org.apache.hadoop.io.SequenceFile$Writer@466668f9, enwiki_071018,AYtsfKtThdIJkVLUSKipA-==,1204860383810=org.apache.hadoop.io.SequenceFile$Writer@7bac9f86,
enwiki_080103_meta,AyBTC9HXLLw5rxUy675O0-==,1204867122628=org.apache.hadoop.io.SequenceFile$Writer@5fb90875,
enwiki_071018,CeAIzFFdqzhyiJQOseL_n-==,1204883192108=org.apache.hadoop.io.SequenceFile$Writer@3eb2492,
enwiki_071018,avwxS5T7D4_OCPeI4F17Fk==,1199837870994=org.apache.hadoop.io.SequenceFile$Writer@381578fa,
enwiki_080103_meta,4zByyv3_p2JHDADZdbXD4-==,1204867058338=org.apache.hadoop.io.SequenceFile$Writer@7780cea1,
enwiki_071018,PperfeKM7-cS9psy-tzmSk==,1199837513465=org.apache.hadoop.io.SequenceFile$Writer@71fc1432,
enwiki_071018,nBPrcMh5Sfi2H7KSrsSgkV==,1199838451040=org.apache.hadoop.io.SequenceFile$Writer@5c921914,
enwiki_080103_meta,KyceV9ER2CAm3B-qYaoASF==,1204867238538=org.apache.hadoop.io.SequenceFile$Writer@6d75d78a,
enwiki_080103,8-CxZA7pnVnBSrjlmICoq-==,1204863457723=org.apache.hadoop.io.SequenceFile$Writer@5399dd2a,
enwiki_071018,y0WpQcE85bGuHBk5NbhDtV==,1197675580338=org.apache.hadoop.io.SequenceFile$Writer@48f093f8,
enwiki_071018_meta,nxL_BrEZ6q2ooIo06-2g1F==,1199839770536=org.apache.hadoop.io.SequenceFile$Writer@1bd5211,
enwiki_071018_meta,jx6p_Uek6ral-F0X3rZnoV==,1199839770535=org.apache.hadoop.io.SequenceFile$Writer@2e3414dc,
enwiki_080103_meta,,1204867086244=org.apache.hadoop.io.SequenceFile$Writer@297de952, enwiki_071018,4b-F0-foVXo03XKFzZHeh-==,1204860659625=org.apache.hadoop.io.SequenceFile$Writer@3242af95,
enwiki_080103,EWNE3bg-K_93e7xaUTGNmk==,1204863572529=org.apache.hadoop.io.SequenceFile$Writer@1a871b47,
enwiki_071018,aR_SC5R5QuEDFSdmlRsk5V==,1199837870994=org.apache.hadoop.io.SequenceFile$Writer@74bd26a4,
enwiki_071018,D4cfNPUvLY9KG8OEZOag0F==,1197675115854=org.apache.hadoop.io.SequenceFile$Writer@5d458f36,
enwiki_080103_meta,Oz2-mF71uv4gsfgrX4-bZ-==,1204866969564=org.apache.hadoop.io.SequenceFile$Writer@9611bc6,
enwiki_071018,6q_ORe3mPzBTOnenVGS6zk==,1204860472398=org.apache.hadoop.io.SequenceFile$Writer@26a3a6f4,
enwiki_080103_meta,Ex3rZAu2sKFj2Bij8bVoZF==,1204867006209=org.apache.hadoop.io.SequenceFile$Writer@21208bc8,
enwiki_071018_meta,Ey13Xkhj5QxDjHcAzaoEkk==,1197679241539=org.apache.hadoop.io.SequenceFile$Writer@4225f0fd}

> Empty region server address in info:server entry and a startcode of -1 in .META.
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-501
>                 URL: https://issues.apache.org/jira/browse/HBASE-501
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.1.0, 0.2.0, 0.16.0
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.1.0
>
>         Attachments: master.log, noise-and-logging-0.1-v2.patch, noise.patch
>
>
> Manufactured a region empty server address and a startcode of -1 when a regionserver
was slow to open a region and the alternative regionserver that had been asked open the region
fails and reports CLOSE to the master.
> Here's long version of story:
> Region is enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==.  Was originally on XX.XX.XX.184:60020
but this node ran out of memory (though it had 2G).
> {code}
> 2008-03-08 00:29:39,472 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020,
call batchUpdate(enwiki_071018,6q_ORe3mPzBTOnenVGS6zk==,1204860472398, 9223372036854775807,
org.apache.hadoop.hbase.io.BatchUpdate@126d2380) from XX.XX.XX.233:54292: error: java.io.IOException:
java.lang.OutOfMemoryError: Java heap space
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
>         at java.lang.Object.clone(Native Method)
>         at java.lang.reflect.Method.getParameterTypes(Unknown Source)
>         at java.lang.Class.searchMethods(Unknown Source)
>         at java.lang.Class.getMethod0(Unknown Source)
>         at java.lang.Class.getMethod(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
> 2008-03-08 00:29:39,472 WARN org.apache.hadoop.ipc.Server: Out of Memory in server select
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.HashMap.newKeyIterator(Unknown Source)
>         at java.util.HashMap$KeySet.iterator(Unknown Source)
>         at java.util.HashSet.iterator(Unknown Source)
>         at sun.nio.ch.SelectorImpl.processDeregisterQueue(Unknown Source)
>         at sun.nio.ch.PollSelectorImpl.doSelect(Unknown Source)
>         at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
>         at sun.nio.ch.SelectorImpl.select(Unknown Source)
>         at sun.nio.ch.SelectorImpl.select(Unknown Source)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:323)
> 2008-03-08 00:31:15,300 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60020,
call batchUpdate(enwiki_080103_meta,,1204867086244, 9223372036854775807, org.apache.hadoop.hbase.io.BatchUpdate@2d13981b)
from XX.XX.XX.233:54810: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap
space
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space
>         at java.lang.String.<init>(Unknown Source)
>         at java.lang.StringBuilder.toString(Unknown Source)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:415)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
> {code}
> Was given to XX.XX.XX.227 at 00:36:20 but this server is crazy replaying a bunch of edits
(Need to stop emitting edits in HStore -- 496 removed outputting skipped edits).  It can't
put the region up immediately.  Takes a long time. 
> Then given to XX.XX.XX.183 at 00:37:26. It fails to open with:
> {code}
> 2008-03-08 00:37:29,827 INFO org.apache.hadoop.hbase.HRegion: compaction completed on
region enwiki_071018,AYtsfKtThdIJkVLUSKipA-==,1204860383810. Took 5sec
> 2008-03-08 00:37:29,943 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region
enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write
to file /hbase/aa0-005-2.u.powerset.com/enwiki_080103/1578810967/page/mapfiles/5679937491167886060/data
by DFSClient_-540201177
>         at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:341)
>         at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>         at java.lang.reflect.Method.invoke(Unknown Source)
> ...
> {code}
> Sends a CLOSE to the master.
> Then 227 says its successfully opened region.
> Master says region server XX.XX.XX.227:60020 should not have opened region enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985
> Now the server field in META is empty.
> {code}
>  59 2008-03-08 00:38:09,167 DEBUG org.apache.hadoop.hbase.HMaster: HMaster.metaScanner
regioninfo: {regionname: enwiki_080103,CzQ7UPCw-AoIn2JzSEN_pV==,1204865434985, startKey: <CzQ7UPCw-AoIn2JzSEN_pV==>,
endKey: <DUwzKe-niVjzlXs1SvrvVk==>, encodedName: 1578810967, offline: true, tableDesc:
{name: enwiki_080103,         families: {anchor:={name: anchor, max versions: 3, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}, misc:={name: misc, max
versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none},
page:={name: page, max versions: 3, compression: NONE, i        n memory: false, max length:
2147483647, bloom filter: none}, redirect:={name: redirect, max versions: 3, compression:
NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: , startCode:
-1
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message