Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 18073 invoked from network); 17 Jul 2008 05:57:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Jul 2008 05:57:54 -0000 Received: (qmail 39337 invoked by uid 500); 17 Jul 2008 05:57:52 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 39297 invoked by uid 500); 17 Jul 2008 05:57:52 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 39286 invoked by uid 99); 17 Jul 2008 05:57:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jul 2008 22:57:52 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jul 2008 05:57:07 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 19430234C174 for ; Wed, 16 Jul 2008 22:57:32 -0700 (PDT) Message-ID: <2023214316.1216274252102.JavaMail.jira@brutus> Date: Wed, 16 Jul 2008 22:57:32 -0700 (PDT) From: "Vinod Kumar Vavilapalli (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3581) Prevent memory intensive user tasks from taking down nodes In-Reply-To: <1009886343.1213704226494.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614231#action_12614231 ] Vinod Kumar Vavilapalli commented on HADOOP-3581: ------------------------------------------------- Summarizing all the discussion that went so far .. - TaskTracker tracks the memory usage of all the tasks and their sub-processes (irrespective of which user runs which tasks). - It uses per-task-objects of classes implementing ProcessTree as described earlier. Currently, we implement ProcfsBasedProcessTree, which works on both Linux and Cygwin. For other OS' we need classes extending ProcessTree. - We will have two configuration properties - per-tracker property mapred.tasktracker.tasks.maxmemory specifying maximum memory usable across all tasks on a tasktracker; and per-job property mapred.map.memlimit.percent for specifying memory usable across all maps in terms of percentage of mapred.tasktracker.tasks.maxmemory. Maximum memory usable by reduce tasks = (100-mapred.map.memlimit.percent )% of mapred.tasktracker.tasks.maxmemory. All these are virtual memory limits, not working set. By default, we can set mapred.tasktracker.tasks.maxmemory to 12GB(4GB RAM + 8GB swap) and mapred.map.memlimit.percent to 33%. Should be ok? - After every heartbeat, TT scans through the list of running tasks and finds if any task trees are transgressing limits and kills them by issuing 'kill ' (SIGTERM) to each process in each of the concerned process-trees. TT will monitor if a process-tree is killed successfully. If not, it issues a subsequent SIGKILL. - ProcfsBasedProcessTree: Constructs process-tree information from /proc file system. TT obtains pid from the task via the rpc method getPid and then constructs the process tree using procfs. Please put forward your objections to the above proposal, if any. > Prevent memory intensive user tasks from taking down nodes > ---------------------------------------------------------- > > Key: HADOOP-3581 > URL: https://issues.apache.org/jira/browse/HADOOP-3581 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Hemanth Yamijala > Assignee: Vinod Kumar Vavilapalli > Attachments: patch_3581_0.1.txt > > > Sometimes user Map/Reduce applications can get extremely memory intensive, maybe due to some inadvertent bugs in the user code, or the amount of data processed. When this happens, the user tasks start to interfere with the proper execution of other processes on the node, including other Hadoop daemons like the DataNode and TaskTracker. Thus, the node would become unusable for any Hadoop tasks. There should be a way to prevent such tasks from bringing down the node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.