hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-4222) Make HLog more resilient to write pipeline failures
Date Thu, 18 Aug 2011 01:25:27 GMT
Make HLog more resilient to write pipeline failures
---------------------------------------------------

                 Key: HBASE-4222
                 URL: https://issues.apache.org/jira/browse/HBASE-4222
             Project: HBase
          Issue Type: Improvement
          Components: wal
            Reporter: Gary Helmling
             Fix For: 0.92.0


The current implementation of HLog rolling to recover from transient errors in the write pipeline
seems to have two problems:

# When {{HLog.LogSyncer}} triggers an {{IOException}} during time-based sync operations, it
triggers a log rolling request in the corresponding catch block, but only after escaping from
the internal while loop.  As a result, the {{LogSyncer}} thread will exit and never be restarted
from what I can tell, even if the log rolling was successful.
# Log rolling requests triggered by an {{IOException}} in {{sync()}} or {{append()}} never
happen if no entries have yet been written to the log.  This means that write errors are not
immediately recovered, which extends the exposure to more errors occurring in the pipeline.

In addition, it seems like we should be able to better handle transient problems, like a rolling
restart of DataNodes while the HBase RegionServers are running.  Currently this will reliably
cause RegionServer aborts during log rolling: either an append or time-based sync triggers
an initial {{IOException}}, initiating a log rolling request.  However the log rolling then
fails in closing the current writer ("All datanodes are bad"), causing a RegionServer abort.
 In this case, it seems like we should at least allow you an option to continue with the new
writer and only abort on subsequent errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message