hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-251) [hbase] Stuck replaying the edits of crashed machine
Date Sat, 08 Mar 2008 02:05:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576470#action_12576470
] 

Bryan Duxbury commented on HBASE-251:
-------------------------------------

It seems that we can't get an fsck to get a sense of what's going on in the dfs - this issue
has been open a long time. 

Should we just handle the eof by no longer replaying edits of a busted log? It seems like
it'd be the easiest thing to do. 

> [hbase] Stuck replaying the edits of crashed machine
> ----------------------------------------------------
>
>                 Key: HBASE-251
>                 URL: https://issues.apache.org/jira/browse/HBASE-251
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.1.0
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.1.0
>
>
> Rapleaf master got stuck trying to replay the logs of the server holding the .META. region.
  Here are pertinent log excerpts:
> {code}
> 2008-01-12 02:17:42,621 DEBUG org.apache.hadoop.hbase.HLog: Creating new log file writer
for path /data/hbase1/hregion_1679905157/oldlogfile.log; map content {spider_pages,25_530417241,1200073704087=org.apache.hadoop.io.SequenceFile$RecordCompressWriter@24336556,
spider_pages,6_74488371,1200029312876=org.apache.had
> oop.io.SequenceFile$RecordCompressWriter@2a4203ab, spider_pages,2_561473281,1200054637086=org.apache.hadoop.io.SequenceFile$RecordCompressWriter@b972625,
.META.,,1=org.apache.hadoop.io.SequenceFile$RecordCompressWriter@67a044a7, spider_pages,5_544278041,1199025825074=org.apache.hadoop.io.SequenceFile$RecordCompress
> Writer@42be0008, spider_pages,49_567090611,1200028378117=org.apache.hadoop.io.SequenceFile$RecordCompressWriter@7bf4cbaa,
spider_pages,5_566039401,1200058871594=org.apache.hadoop.io.SequenceFile$RecordCompressWriter@16479e88,
spider_pages,59_360738971,1200073647952=org.apache.hadoop.io.SequenceFile$RecordCompressWr
> iter@70494d14, spider_pages,59_302628011,1200073647951=org.apache.hadoop.io.SequenceFile$RecordCompressWriter@654670a8}
> 2008-01-12 02:17:44,124 DEBUG org.apache.hadoop.hbase.HLog: Applied 20000 edits
> 2008-01-12 02:17:49,076 DEBUG org.apache.hadoop.hbase.HLog: Applied 30000 edits
> 2008-01-12 02:17:49,078 DEBUG org.apache.hadoop.hbase.HLog: Applied 30003 total edits
> 2008-01-12 02:17:49,078 DEBUG org.apache.hadoop.hbase.HLog: Splitting 1 of 2: hdfs://tf1:7276/data/hbase1/log_XX.XX.XX.32_1200011947645_60020/hlog.dat.003
> 2008-01-12 02:17:52,574 DEBUG org.apache.hadoop.hbase.HLog: Applied 10000 edits
> 2008-01-12 02:17:59,822 WARN org.apache.hadoop.hbase.HMaster: Processing pending operations:
ProcessServerShutdown of XX.XX.XX.32:60020
> java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:56)
>         at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1763)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1663)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1709)
>         at org.apache.hadoop.hbase.HLog.splitLog(HLog.java:168)
>         at org.apache.hadoop.hbase.HMaster$ProcessServerShutdown.process(HMaster.java:2144)
>         at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1056)
> 2008-01-12 02:17:59,822 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop:
ProcessServerShutdown of XX.XX.XX.32:60020
> {code}
> It keeps doing the above over and over again.
> I suppose we could skip bad logs... or just shut down master w/ a reason why.
> Odd is that we seem to be well into the file -- we've run over 10000 edits... before
we trip over the EOF.
> I've asked for an fsck to see what that says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message