hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4047) BPServiceActor has nested shouldRun loops
Date Fri, 16 Nov 2012 21:45:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499163#comment-13499163

Eli Collins commented on HDFS-4047:

That's correct, I captured that in the comments in the patch but not in the jira - sorry -
I should have called that out explicitly here.

-   * No matter what kind of exception we get, keep retrying to offerService().
-   * That's the loop that connects to the NameNode and provides basic DataNode
-   * functionality.
+   * Main loop for each BP thread. It retries on IOExceptions, only
+   * stops when "shouldRun" or "shouldServiceRun" are false, ie
+   * on shutdown or refreshNamenodes (or non-IOE).

My thinking from HDFS-2882 and HDFS-4201 is that we shouldn't soldier on in the case of an
RTE, eg NPE due to a BP failing to initialize, as this likely indicates a host configuration
error. I could also see the point of view that the DN shouldn't stop running because one BP
failed because perhaps the other is alive and well. What do you think?

> BPServiceActor has nested shouldRun loops
> -----------------------------------------
>                 Key: HDFS-4047
>                 URL: https://issues.apache.org/jira/browse/HDFS-4047
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>            Priority: Minor
>         Attachments: HADOOP-4047.patch, HDFS-4047.patch, hdfs-4047.txt, hdfs-4047.txt
> BPServiceActor#run and offerService booth have while shouldRun loops. We only need the
outer one, ie we can hoist the info log from offerService out to run and remove the while
> {code}
> BPServiceActor#run:
> while (shouldRun()) {
>   try {
>     offerService();
>   } catch (Exception ex) {
> ...
> offerService:
> while (shouldRun()) {
>   try {
> {code}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message