hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
Date Mon, 17 Sep 2012 20:39:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457296#comment-13457296
] 

Devaraj Das commented on HBASE-6758:
------------------------------------

[~jdcryans] Thanks for looking. Responses below.

bq. My understanding of this patch is that it reduces the race condition but it still leaves
a small window eg you can take the "fileNotInUse" snapshot, get "false", and the moment after
that the log could roll. If this is correct, I'm not sure it's worth the added complexity.

I don't think there is ever that window. The replication executor thread picks up a path that
the LogRoller puts in the replicator's queue BEFORE the log roll happens (and the HLog constructor
puts the first path before the replication executor starts). The replication executor is always
trailing, and so when the HLog guy says that a path is not in use (being written to), it seems
to me a fact that it indeed is not being written to and any writes that ever happened was
in the past. Also note that the currentPath is reset AFTER a log roll, which is kind of delayed..

bq. It seems to me this is a case where we'd need to lock HLog.cacheFlushLock for the time
we read the log to be 100% sure log rolling doesn't happen. This has multiple side effects
like delaying flushes and log rolls for a few ms while replication is reading the log. It
would also require having a way to get to the WAL from ReplicationSource.

Yeah, I tried my best to avoid taking that crucial lock!

bq. Anyways, one solution I can think of that doesn't involve leaking HRS into replication
would be giving the log a "second chance". Basically if you get an EOF, flip the secondChance
bit. If it's on then you don't get rid of that log yet. Reset the bit when you loop back to
read, now if there was new data added you should get it else go to the next log.

I considered some variant of this. However, I gave it up and took a more conservative approach
- make sure that the replication-executor thread gets at least one pass at a CLOSED file.
All other solutions seemed incomplete to me and prone to races...

[~stack] forgot to answer one of your previous questions.
bq. Should currentFilePath be an atomic reference so all threads see the changes when they
happen?

I think volatile suffices for the use case here.
                
> [replication] The replication-executor should make sure the file that it is replicating
is closed before declaring success on that file
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6758
>                 URL: https://issues.apache.org/jira/browse/HBASE-6758
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>         Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate since the
file hasn't been closed yet. Upon closing, the new data becomes visible. Before that happens
the ZK node shouldn't be deleted in ReplicationSourceManager.logPositionAndCleanOldLogs. Changes
need to be made in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message