hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC
Date Tue, 25 Oct 2011 23:20:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135545#comment-13135545
] 

Jean-Daniel Cryans commented on HBASE-3515:
-------------------------------------------

To reiterate the problem, it's possible to not be able to add an HLog to replicate if the
session is expired when log rolling. HLog currently doesn't get any feedback from the WALActionListeners,
even if they fail at doing their job.

One way of fixing it would be to throw an exception and stop the log rolling, but it means
that if there's many listeners that some may already have processed the adding of the log.
We could also kill the region server plain and simple if it happens.

I'm in favor of the latter.
                
> [replication] ReplicationSource can miss a log after RS comes out of GC
> -----------------------------------------------------------------------
>
>                 Key: HBASE-3515
>                 URL: https://issues.apache.org/jira/browse/HBASE-3515
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3515.patch
>
>
> This is from Hudson build 1738, if a log is about to be rolled and the ZK connection
is already closed then the replication code will fail at adding the new log in ZK but the
log will still be rolled and it's possible that some edits will make it in.
> From the log:
> {quote}
> 2011-02-08 10:21:20,618 FATAL [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller]
regionserver.HRegionServer(1383):
>  ABORTING region server serverName=vesta.apache.org,46117,1297160399378, load=(requests=1525,
regions=12,
>  usedHeap=273, maxHeap=1244): Failed add log to list
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for 
>  /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509
> ...
> 2011-02-08 10:21:22,444 DEBUG [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0]
wal.HLogSplitter(258):
>  Splitting hlog 8 of 8: hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509,
length=0
> 2011-02-08 10:21:22,862 DEBUG [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0]
wal.HLogSplitter(436):
>  Pushed=31 entries from hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509
> {quote}
> The easiest thing to do would be let the exception out and cancel the log roll.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message