hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4222) Make HLog more resilient to write pipeline failures
Date Fri, 19 Aug 2011 19:00:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087899#comment-13087899
] 

Ted Yu commented on HBASE-4222:
-------------------------------

@Gary:
Can you rebase the patch now that HBASE-4095 got integrated ?
{code}
Hunk #7 succeeded at 1055 (offset 21 lines).
1 out of 7 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java.rej
patching file src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
Hunk #1 FAILED at 19.
Hunk #2 FAILED at 67.
Hunk #3 succeeded at 122 (offset -2 lines).
Hunk #4 succeeded at 378 with fuzz 2 (offset 42 lines).
2 out of 4 hunks FAILED -- saving rejects to file src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java.rej
{code}
Thanks

> Make HLog more resilient to write pipeline failures
> ---------------------------------------------------
>
>                 Key: HBASE-4222
>                 URL: https://issues.apache.org/jira/browse/HBASE-4222
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>             Fix For: 0.92.0
>
>
> The current implementation of HLog rolling to recover from transient errors in the write
pipeline seems to have two problems:
> # When {{HLog.LogSyncer}} triggers an {{IOException}} during time-based sync operations,
it triggers a log rolling request in the corresponding catch block, but only after escaping
from the internal while loop.  As a result, the {{LogSyncer}} thread will exit and never be
restarted from what I can tell, even if the log rolling was successful.
> # Log rolling requests triggered by an {{IOException}} in {{sync()}} or {{append()}}
never happen if no entries have yet been written to the log.  This means that write errors
are not immediately recovered, which extends the exposure to more errors occurring in the
pipeline.
> In addition, it seems like we should be able to better handle transient problems, like
a rolling restart of DataNodes while the HBase RegionServers are running.  Currently this
will reliably cause RegionServer aborts during log rolling: either an append or time-based
sync triggers an initial {{IOException}}, initiating a log rolling request.  However the log
rolling then fails in closing the current writer ("All datanodes are bad"), causing a RegionServer
abort.  In this case, it seems like we should at least allow you an option to continue with
the new writer and only abort on subsequent errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message