hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10902) QJM should not consider stale/failed txn available in any one of JNs.
Date Mon, 26 Sep 2016 08:35:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522464#comment-15522464

Vinayakumar B commented on HDFS-10902:

One possible solution could be as below:

1. From all available JNs we can come to know that what is the last readable txn available
in each JN.
2. We can select the *minimum lastReadable txn*, which is matching in min quorum JNs, as max
Txn to consider as valid.
3. If there is any mismatch between lastReadableTxn in available JNs and there is no quorum
matching, then read can be failed as stale txns should not be considered.

[~andrew.wang], [~kihwal], [~umamaheswararao], [~szetszwo], please provide your opinions on
this, as this seems to be critical to decide valid txns.

> QJM should not consider stale/failed txn available in any one of JNs.
> ---------------------------------------------------------------------
>                 Key: HDFS-10902
>                 URL: https://issues.apache.org/jira/browse/HDFS-10902
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: qjm
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>            Priority: Critical
> In one of our cluster faced an issue, where NameNode restart failed due to a stale/failed
txn available in one JN but not others. 
> Scenario is:
> 1. Full cluster restart
> 2. startLogSegment Txn(195222) synced in Only one JN but failed to others, because they
were shutting down. Only editlog file was created but txn was not synced in others, so after
restart they were marked as empty.
> 3. Cluster restarted. During failover, this new logSegment missed the recovery because
this JN was slow in responding to this call.
> 4. Other JNs recover was successfull, as there was no in-progress files.
> 5. editlog.openForWrite() detected that (195222) was already available, and failed the
> Same steps repeated until that stale editlog in JN was manually deleted.
> Since QJM is a quorum of JNs, txn is considered successfull, if its written min quorum.
Otherwise it will be failed.
> So, same case should be applied while selecting streams for reading also.
> Stale/failed txns available in only less JNs should not be considered for reading.
> HDFS-10519, does similar work to consider 'durable' txns based on 'committedTxnId'. But
updating 'committedTxnId' for every flush with one more RPC seems tobe problematic to performance.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message