Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A47D17474 for ; Mon, 23 Mar 2015 11:13:26 +0000 (UTC) Received: (qmail 92005 invoked by uid 500); 23 Mar 2015 11:13:19 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 91897 invoked by uid 500); 23 Mar 2015 11:13:19 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 91887 invoked by uid 99); 23 Mar 2015 11:13:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Mar 2015 11:13:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ranadip.c@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Mar 2015 11:12:53 +0000 Received: by wixw10 with SMTP id w10so58811295wix.0 for ; Mon, 23 Mar 2015 04:11:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ZDSYyOdgOAxF38y9fHUDE3lLcj1hoeXRRFh6Kc/BebY=; b=EaflhvD7GzYr8jkGiuUTl6sSzDOMzq+6ufOubT0GrWu+S6Z9w7zK3feujMrjSYoe2h Q2/3JE+qY9NCfthD7QTSCeGrFXzdipsGKxL6yEaPn1yg2s01c86j5kqxDtMc5wQ0D5yx eZ9e9AmB2mmXONP7CnWAF42APlqOfdwxa9/o5Z/OB0ZpGJ8GlH3L4ggxqcEcOLsCdWLR ogu/fhGCgL9sTwcn2W4Xb1cJoAhS80qDRrIdbiYgvN4HHruVdoAQW9AxWBsNp8s1oMy3 VYxfsGKecNaU/cC28Ot64LwwHtRdw6UmkFn05U7wFLlHAFqqBZhvZ3M6gOBIYaJBhDQq xOoA== MIME-Version: 1.0 X-Received: by 10.194.176.4 with SMTP id ce4mr19902718wjc.75.1427109082226; Mon, 23 Mar 2015 04:11:22 -0700 (PDT) Received: by 10.194.63.169 with HTTP; Mon, 23 Mar 2015 04:11:22 -0700 (PDT) In-Reply-To: References: Date: Mon, 23 Mar 2015 11:11:22 +0000 Message-ID: Subject: Re: [External] Re: HDFS Block Bad Response Error From: Ranadip Chatterjee To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0141a3e0e6e5380511f2b91f X-Virus-Checked: Checked by ClamAV on apache.org --089e0141a3e0e6e5380511f2b91f Content-Type: text/plain; charset=UTF-8 You could check which block that file belongs to by running: $> hadoop fsck / -files -blocks | grep "blk_1084609656_11045296" -B 2 On 20 March 2015 at 14:56, Shipper, Jay [USA] wrote: > > I just checked the input data and the output data (what the job managed > to output before failing), and there are no bad blocks in either. > > From: Ranadip Chatterjee > Reply-To: "user@hadoop.apache.org" > Date: Thursday, March 19, 2015 3:51 AM > To: "user@hadoop.apache.org" > Subject: [External] Re: HDFS Block Bad Response Error > > Have you tried hdfs fsck command to try and catch any inconsistencies > with that block? > On 16 Mar 2015 19:39, "Shipper, Jay [USA]" wrote: > >> On a Hadoop 2.4.0 cluster, I have a job running that's encountering the >> following warnings in one of its map tasks (IPs changed, but otherwise, >> this is verbatim): >> >> --- >> 2015-03-16 06:59:37,994 WARN [ResponseProcessor for block >> BP-437460642-10.0.0.1-1391018641114:blk_1084609656_11045296] >> org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor >> exception for block >> BP-437460642-10.0.0.1-1391018641114:blk_1084609656_11045296 >> java.io.EOFException: Premature EOF: no length prefix available >> at >> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1990) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:796) >> 2015-03-16 06:59:37,994 WARN [ResponseProcessor for block >> BP-437460642-10.0.0.1-1391018641114:blk_1084609655_11045295] >> org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor >> exception for block >> BP-437460642-10.0.0.1-1391018641114:blk_1084609655_11045295 >> java.io.IOException: Bad response ERROR for block >> BP-437460642-10.0.0.1-1391018641114:blk_1084609655_11045295 from datanode >> 10.0.0.1:1019 >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819) >> --- >> >> This job is launched from Hive 0.13.0, and it's consistently happening >> on the same split, which is on a sequence file. After logging a few errors >> like the above, the map task seems to make no progress and eventually times >> out (with a mapreduce.task.timeout value greater than 5 hours). >> >> Any pointers on how to begin troubleshooting and resolving this issue? >> In searching around, it was suggested that this is indicative of a "network >> issue", but as it happens on the same split consistently, that explanation >> seems unlikely. >> > -- Regards, Ranadip Chatterjee --089e0141a3e0e6e5380511f2b91f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
You could check which block that file belongs to by r= unning:

$> hadoop fsck / -files -blocks | grep "blk_10846096= 56_11045296" -B 2


On 20 March 2015 at 14:56, Shipper, Jay [USA] <= Shipper_Jay@bah.com> wrote:

I just checked the input data and the output data (what the job manage= d to output before failing), and there are no bad blocks in either.

From: Ranadip Chatterjee <ranadip.c@gmail.com&= gt;
Reply-To: "user@hadoop.apache.org" &= lt;user@hadoop.= apache.org>
Date: Thursday, March 19, 2015 3:51= AM
To: "user@hadoop.apache.org" <user@hadoop.apache= .org>
Subject: [External] Re: HDFS Block = Bad Response Error

Have you tried hdfs fsck command to try and catch any incons= istencies with that block?

On 16 Mar 2015 19:39, "Shipper, Jay [USA]&q= uot; <Shipper_J= ay@bah.com> wrote:
On a Hadoop 2.4.0 cluster, I have a job running that's encounterin= g the following warnings in one of its map tasks (IPs changed, but otherwis= e, this is verbatim):

---
2015-03-16 06:59:37,994 WARN [ResponseProcessor for block BP-437460642= -10.0.0.1-1391018641114:blk_1084609656_11045296] org.apache.hadoop.hdfs.DFS= Client: DFSOutputStream ResponseProcessor exception =C2=A0for block BP-4374= 60642-10.0.0.1-1391018641114:blk_1084609656_11045296
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.= protocolPB.PBHelper.vintPrefixed(PBHelper.java:1990)
at org.apache.hadoop.hdfs.= protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
at org.apache.hadoop.hdfs.= DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:796= )
2015-03-16 06:59:37,994 WARN [ResponseProcessor for block BP-437460642= -10.0.0.1-1391018641114:blk_1084609655_11045295] org.apache.hadoop.hdfs.DFS= Client: DFSOutputStream ResponseProcessor exception =C2=A0for block BP-4374= 60642-10.0.0.1-1391018641114:blk_1084609655_11045295
java.io.IOException: Bad response ERROR for block BP-437460642-10.0.0.= 1-1391018641114:blk_1084609655_11045295 from datanode 10.0.0.1:1019
at org.apache.hadoop.hdfs.= DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819= )
---

This job is launched from Hive 0.13.0, and it's consistently happe= ning on the same split, which is on a sequence file.=C2=A0 After logging a = few errors like the above, the map task seems to make no progress and event= ually times out (with a mapreduce.task.timeout value greater than 5 hours).

Any pointers on how to begin troubleshooting and resolving this issue?= =C2=A0 In searching around, it was suggested that this is indicative of a &= quot;network issue", but as it happens on the same split consistently,= that explanation seems unlikely.



--
Regards,
Ranadip Chatterjee
--089e0141a3e0e6e5380511f2b91f--