Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3CD519F6B for ; Tue, 22 May 2012 14:13:58 +0000 (UTC) Received: (qmail 44711 invoked by uid 500); 22 May 2012 14:13:54 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 44654 invoked by uid 500); 22 May 2012 14:13:54 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 44646 invoked by uid 99); 22 May 2012 14:13:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 May 2012 14:13:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 May 2012 14:13:47 +0000 Received: by qcsc21 with SMTP id c21so5196216qcs.35 for ; Tue, 22 May 2012 07:13:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding:x-gm-message-state; bh=eFGPWLW4MEEN+xyYJu8866imC3VxzQWZdyXM/mKn444=; b=KM6KhLuv3wYGPprGFYCq3dcEeFzReO9WGx1KTq+FRXO95Kie+TXtzxm8ueSnku8h2C 9F0VSIh8SEXuphWZk7lfqwVzpdslisMWdIOwu4iiNAlujIw4vmZFMBLaa/BL2ESRfZxB BZJrOs0tSNCOXc+jdyjTam/IWinkJUUL1zHoHZrGoni404ASaYQtfEE5044oXZ/tZkxI DBlwa0gxHA3hndZbxmIcvn+V38CiiMHjjcJODAGeHGmsrYqlOmumvwYDYLYQ2neZ2Xgx beN3PDvGqaWzCFKDTW7dzkUKrpMU0WYTwHsFo4dx/9VdNtTKvsSwLXXbHRc9sP8yTwJ8 5Hcg== Received: by 10.224.17.210 with SMTP id t18mr45271198qaa.93.1337696006416; Tue, 22 May 2012 07:13:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.187.21 with HTTP; Tue, 22 May 2012 07:13:05 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Tue, 22 May 2012 19:43:05 +0530 Message-ID: Subject: Re: Map/Reduce Tasks Fails To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQmAR+NHolpzJ6vf2qcB6xGPq4wBuIds4GfVg3IUlk1DUmWQEA/DKls1V/PVSCsHi/RZ0mIe X-Virus-Checked: Checked by ClamAV on apache.org Sandeep, Is the same DN 10.0.25.149 reported across all failures? And do you notice any machine patterns when observing the failed tasks (i.e. are they clumped on any one or a few particular TTs repeatedly)? On Tue, May 22, 2012 at 7:32 PM, Sandeep Reddy P wrote: > Hi, > We have a 5node cdh3u4 cluster running. When i try to do teragen/terasort > some of the map tasks are Failed/Killed and the logs show similar error o= n > all machines. > > 2012-05-22 09:43:50,831 INFO org.apache.hadoop.hdfs.DFSClient: > Exception in createBlockOutputStream 10.0.25.149:50010 > java.net.SocketTimeoutException: 69000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=3D/10.0.25.149:55835 > remote=3D/10.0.25.149:50010] > 2012-05-22 09:44:25,968 INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_7260720956806950576_1825 > 2012-05-22 09:44:25,973 INFO org.apache.hadoop.hdfs.DFSClient: > Excluding datanode 10.0.25.149:50010 > 2012-05-22 09:46:36,350 WARN org.apache.hadoop.mapred.Task: Parent > died. =A0Exiting attempt_201205211504_0007_m_000016_1. > > > > Are these kind of errors common?? Atleast 1 map task is failing due to > above reason on all the machines.We are using 24 mappers for teragen. > For us it took 3hrs 44min 17 sec to generate 50Gb data with 24 mappers > and 17failed/8 killed task attempts. > > 24min 10 sec for 5GB data with 24 mappers and 9 killed Task attempts. > Cluster works good for small datasets. --=20 Harsh J