Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 3625 invoked from network); 9 Sep 2010 21:52:08 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Sep 2010 21:52:08 -0000 Received: (qmail 68231 invoked by uid 500); 9 Sep 2010 21:52:05 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 68079 invoked by uid 500); 9 Sep 2010 21:52:04 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Delivered-To: moderator for common-user@hadoop.apache.org Received: (qmail 41602 invoked by uid 99); 9 Sep 2010 18:28:57 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of pavelgutin@gmail.com designates 209.85.214.48 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:from:date :message-id:subject:to:content-type; bh=1iTEuAMZHAMDITrV4VYRO/MF3jiHsPTrqaIbwXDXpfM=; b=hKbHuTYIqv7wbysENRUVDX1rsZgeTqy1ljyGiiv1ONQvobEII2o78qWPyt5DSqt8K9 /TVtxfABVCdxuAhS9XEtvBxm8O5vjcB1m0M7N6oehnTvNCjxz7/BfJXDOSzRUgvl/CvQ J3o5Sh8ToRu/9ZHtZIEMDo/iZMAZ50ReHXbxA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=xkfz+SOoka0ZeaNcaX168XQFAg0NNz0UbvxA5MNVALxjY6w29Wwu3/Xw69LeKZ1uIn Bl2K2mWvK0oqdAlSsW/by1bjqfw3/BZ2R8Yq/VJI64puLTZVDPt32Mwa7gaZ/+2k2x5l VPFdoFCL+QheEheS+uSzZx4NgsUmc9ru1m7wk= MIME-Version: 1.0 From: Pavel Gutin Date: Thu, 9 Sep 2010 14:28:10 -0400 Message-ID: Subject: My mappers stop responding even though they reach 100% To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 I've been having a problem for the past few weeks. I will kick off a job that will have a bunch of map tasks. At some point in the job (sometimes at 1%, sometimes at 40%) the mappers will start reporting the following error "Task attempt_201009021455_0033_m_000000_0 failed to report status for 600 seconds. Killing!" The mapper progress will sit at 100%. Eventually, too many mappers fail, and the job fails. I have added context.progress(); to the map method, based on some reading that I've done, but it doesn't help. I would appreciate any help you can provide me. Thank you!