hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3618) JobClient should keep on retrying if the jobtracker is still initializing
Date Tue, 24 Jun 2008 09:38:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607533#action_12607533
] 

Steve Loughran commented on HADOOP-3618:
----------------------------------------

the wait loop spins and sleeps:

     
+    // check if the jobtracker is ready
+    while (true) {
+      if (jobSubmitClient.isReady()) {
+        break;
+      }
+      try {
+        Thread.sleep(JOBTRACKER_POLL_INTERVAL);
+      } catch (InterruptedException ie){}
+    }

1. If the thread is interrupted, it implies somebody wanted to stop it. why not listen to
that request by ending the thread, rather than spinning indefinately. This loop will make
a job client thread impossible to kill in-process until the tracker is live.

2. in other projects, we've found problems if a few hundred machines have just come up fully
synchronised, as they can do when a site's power gets toggled. They all poll simultaneously,
flood the network and then wait..even with exponential back-off they are all in sync. So:
a bit of random jitter on the sleep is good; likewise, the poll interval may be a configuration
point.

If this sleep-until-ready pattern is common, it should be factored out into a method of its
own and shared across things. I've been stubbing out (for my deployment use) a simple lifecycle
interface (start/stop/getstatus/ping)...if that were adopted then we this patch could poll
the getStatus() method.

> JobClient should keep on retrying if the jobtracker is still initializing
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3618
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3618
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3618.patch
>
>
> When the user submits the job while the jobtracker is still initializing, the jobclient
comes out with an exception. ideally the jobclient should keep on retrying until the jobtracker
is up and ready. This will also take care of HADOOP-3289. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message