Return-Path: Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: (qmail 52328 invoked from network); 1 Nov 2010 12:57:22 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Nov 2010 12:57:22 -0000 Received: (qmail 67823 invoked by uid 500); 1 Nov 2010 12:57:53 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 67506 invoked by uid 500); 1 Nov 2010 12:57:50 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 67375 invoked by uid 99); 1 Nov 2010 12:57:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Nov 2010 12:57:49 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Nov 2010 12:57:47 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oA1CvQTs007287 for ; Mon, 1 Nov 2010 12:57:26 GMT Message-ID: <16826792.172841288616246024.JavaMail.jira@thor> Date: Mon, 1 Nov 2010 08:57:26 -0400 (EDT) From: "Liyin Liang (JIRA)" To: mapreduce-dev@hadoop.apache.org Subject: [jira] Created: (MAPREDUCE-2168) We should implement limits on shuffle connections to TaskTracker per job MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org We should implement limits on shuffle connections to TaskTracker per job ------------------------------------------------------------------------- Key: MAPREDUCE-2168 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2168 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Liyin Liang As trailing map tasks will be attacked by all reduces simultaneously, all the worker threads that for the http server of a TaskTracker may be occupied by one job's reduce tasks to fetch map outputs. Then this tasktracker's iowait and load will be very high (100+ in our cluster, we set tasktracker.http.threads with 100). What's more, other job's reduces have to wait some time (may be several minutes) to connect to the TaskTracker to fetch there map's outputs. So I think we should implement limits on shuffle connections: 1. limit the worker threads' number maybe percent occupied the same job's reduces ; 2. limit the worker threads' number serving the same map output simultaneously. Thoughts? ps: we are using hadoop 0.19. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.