hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3752) BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in case of BKJM
Date Thu, 09 Aug 2012 12:18:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431765#comment-13431765
] 

Rakesh R commented on HDFS-3752:
--------------------------------

Thanks Todd for looking into the issue. I've just few points and would like to know your thoughts.

@Todd
{quote}It seems this is because the BKJournalManager doesn't support selectInputStreams with
inProgressOK == true, right?
Maybe we can introduce a new API which BKJM (and QJM) can implement, which would return the
list of available edits ranges, but not necessarily be available to read them (since these
journals don't allow reading from in-progress edits). That would solve the issue, right? Do
you have an idea for such an API?
{quote}

Yeah, there is a bug in BKJM side while reading inProgress file and as follows:
Problem comes due to: While bootstrapstandby its checking whether the txid + 1 onwards transaction
exists in the sharedstorage before copying the fsImage_txid. If the inprogress contains only
one entry(txid + 1 th entry) when calling through bookkeeper readLastConfirmed() api, its
returning '-1' as readLastConfirmed entry and is not accurately returning the last transction
entry (this is a problematic behaviour in Bookkeeper).

I do agree to avoid reading the entries from inProgress file in the defect scenario described
by Vinay.

I'm having one more doubt why copying of fsImage_txid is looking at the shared storage. Is
the intention to perform sanity checks, whether shared storage is available or not?

Presently Standby node will do tailing logs only from the finalized log segments. Similar
lines, this flow also would directly copy the fsImage without checking the transactions present
in inprogress file(in the shared storage) and start as Standby. Anyway next tailing will do
the rollover and reading the edits. How does it sound?

If we couldn't avoid sanity check of the shared storage then I feel bootstrap can force rollover
and then check only till finalized log segments.
                
> BOOTSTRAPSTANDBY for new Standby node will not work just after saveNameSpace at ANN in
case of BKJM
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3752
>                 URL: https://issues.apache.org/jira/browse/HDFS-3752
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: 2.1.0-alpha
>            Reporter: Vinay
>
> 1. do {{saveNameSpace}} in ANN node by entering into safemode
> 2. in another new node, install standby NN and do BOOTSTRAPSTANDBY
> 3. Now StandBy NN will not able to copy the fsimage_txid from ANN
> This is because, SNN not able to find the next txid (txid+1) in shared storage.
> Just after {{saveNameSpace}} shared storage will have the new logsegment with only START_LOG_SEGEMENT
edits op.
> and BookKeeper will not be able to read last entry from inprogress ledger.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message