hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-3515) [replication] ReplicationSource can miss a log after RS comes out of GC
Date Tue, 08 Feb 2011 19:25:58 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans updated HBASE-3515:
--------------------------------------

    Attachment: HBASE-3515.patch

This patch simply adds a check if the region server is going down before closing the file.
In the replication case it fixes the issue, since if it fails it will set that flag to true.

The issue with throwing an exception on i.logRolled(newPath) is that since there's potentially
many of them, throwing midway would mean that you have to implement a roll back. There's nothing
at the moment that requires that level of complexity.

> [replication] ReplicationSource can miss a log after RS comes out of GC
> -----------------------------------------------------------------------
>
>                 Key: HBASE-3515
>                 URL: https://issues.apache.org/jira/browse/HBASE-3515
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: HBASE-3515.patch
>
>
> This is from Hudson build 1738, if a log is about to be rolled and the ZK connection
is already closed then the replication code will fail at adding the new log in ZK but the
log will still be rolled and it's possible that some edits will make it in.
> From the log:
> {quote}
> 2011-02-08 10:21:20,618 FATAL [RegionServer:0;vesta.apache.org,46117,1297160399378.logRoller]
regionserver.HRegionServer(1383):
>  ABORTING region server serverName=vesta.apache.org,46117,1297160399378, load=(requests=1525,
regions=12,
>  usedHeap=273, maxHeap=1244): Failed add log to list
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for 
>  /1/replication/rs/vesta.apache.org,46117,1297160399378/2/vesta.apache.org%3A46117.1297160480509
> ...
> 2011-02-08 10:21:22,444 DEBUG [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0]
wal.HLogSplitter(258):
>  Splitting hlog 8 of 8: hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509,
length=0
> 2011-02-08 10:21:22,862 DEBUG [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:56008-0]
wal.HLogSplitter(436):
>  Pushed=31 entries from hdfs://localhost:55474/user/hudson/.logs/vesta.apache.org,46117,1297160399378/vesta.apache.org%3A46117.1297160480509
> {quote}
> The easiest thing to do would be let the exception out and cancel the log roll.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message