Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6B5210750 for ; Thu, 29 Aug 2013 02:48:17 +0000 (UTC) Received: (qmail 52400 invoked by uid 500); 29 Aug 2013 02:48:15 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 52329 invoked by uid 500); 29 Aug 2013 02:48:09 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 52321 invoked by uid 99); 29 Aug 2013 02:48:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Aug 2013 02:48:07 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lordjoe2000@gmail.com designates 209.85.214.49 as permitted sender) Received: from [209.85.214.49] (HELO mail-bk0-f49.google.com) (209.85.214.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Aug 2013 02:48:01 +0000 Received: by mail-bk0-f49.google.com with SMTP id r7so2427562bkg.8 for ; Wed, 28 Aug 2013 19:47:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=CYiCjfZqPB/9UrdsmdYZT95rpk5LrYgsNCVHG7YsGHg=; b=uFRuYMpJy544rsheqq9rxkyr47P18X9ZHBfBru59hzOf2DPJLTiO2tz+Jg5WY+Rvqk uRnsBzU8vuhELLEOFOy56WjBMkw0q5eiKeeKttmmL89AjfXMm0Oob5j9W9XQCGT8rdvx 1nNyw080/nenH6CGsdLUH91ipqPfknjT4J5Pt3cZyhMfjfPUptnbzf1ULidir4Hm1kdy vUCkdEZVDbtgC493D6JthY/kAj30JRXnG7lYR3oH1klttXyL1NW7t8A0iGeB9PThBkBp zw+RKnT4Wtcnm46ysLbEJzOuLrL8RNGui5oqUEwqCO7BBaHVTkp4MWGW6wS4lIpE4m60 d7Bw== MIME-Version: 1.0 X-Received: by 10.204.62.132 with SMTP id x4mr426891bkh.22.1377744459939; Wed, 28 Aug 2013 19:47:39 -0700 (PDT) Received: by 10.205.68.71 with HTTP; Wed, 28 Aug 2013 19:47:39 -0700 (PDT) Date: Wed, 28 Aug 2013 19:47:39 -0700 Message-ID: Subject: Some jobs seem to run forever From: Steve Lewis To: mapreduce-user Content-Type: multipart/alternative; boundary=001a11c3916220723404e50d21fa X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3916220723404e50d21fa Content-Type: text/plain; charset=ISO-8859-1 I have an issue that I am running a hadoop job on a 40 node cluster with about 300 Map tasks and about 300 reduce tasks. Most tasks complete within 20 minutes but a few, typically less than 10 run for many hours. If they complete I see nothing to suggest that the number of bytes read or written or the number of records read or written is significantly different from tasks that run much faster. I sometimes see multiple attempts - usually only two and the cluster is doing nothing else. Any suggested tuning? --001a11c3916220723404e50d21fa Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I have an issue that I am running a hadoop job on a 40 nod= e cluster with about 300 Map tasks and about 300 reduce tasks. Most tasks c= omplete within 20 minutes but a few, typically less than 10 run for many ho= urs.=A0
If they complete I see nothing to suggest that the number of bytes read or = written or the number of records read or written is significantly different= from tasks that run much faster. I sometimes see multiple attempts - usual= ly only two and the cluster is doing nothing else.

Any suggested tuning?

<= div>=A0
--001a11c3916220723404e50d21fa--