From hadoop-dev-return-7553-apmail-lucene-hadoop-dev-archive=lucene.apache.org@lucene.apache.org Mon Feb 05 20:09:28 2007 Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 15930 invoked from network); 5 Feb 2007 20:09:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Feb 2007 20:09:27 -0000 Received: (qmail 39924 invoked by uid 500); 5 Feb 2007 20:09:34 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 39675 invoked by uid 500); 5 Feb 2007 20:09:33 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 39666 invoked by uid 99); 5 Feb 2007 20:09:33 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Feb 2007 12:09:33 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Feb 2007 12:09:26 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id C4FD07142D4 for ; Mon, 5 Feb 2007 12:09:05 -0800 (PST) Message-ID: <12007659.1170706145804.JavaMail.jira@brutus> Date: Mon, 5 Feb 2007 12:09:05 -0800 (PST) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-491) streaming jobs should allow programs that don't do any IO for a long time In-Reply-To: <12167475.1156886482398.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470338 ] Doug Cutting commented on HADOOP-491: ------------------------------------- So streaming jobs should have no timeout by default? I can sort of see adding the feature of disabling task timeouts, and also of facillitating this from streaming, but do streaming applications really never hang? Should we change the default for all applications, not just streaming? I'm trying to understand the logic here. Also, as a new feature, shouldn't this be targetted for 0.12.0? > streaming jobs should allow programs that don't do any IO for a long time > ------------------------------------------------------------------------- > > Key: HADOOP-491 > URL: https://issues.apache.org/jira/browse/HADOOP-491 > Project: Hadoop > Issue Type: New Feature > Components: contrib/streaming > Affects Versions: 0.11.0 > Reporter: arkady borkovsky > Assigned To: Arun C Murthy > Fix For: 0.11.1 > > Attachments: HADOOP-491_20070205_1.patch > > > The jobtracker relies on task to send heartbeats to know the tasks are still alive. > There is a 600 seconds timeout preset. > hadoop streaming also uses input to or output from the program it spawns to indicate progress, sending appropriate heartbeats. > Some spawned programs spend longer that 600 seconds without any output while being perfectly healthy. > It would be good to enhance the interface between hadoop streaming and the programs it spawns to track a healthy program in the absense of output. > There are certain dangers with this protocol: e.g. a task can run a separate thread that does nothing but send "i'm alive" message. This would be a user bug to abuse the API in such way. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.