Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D10ACFC98 for ; Wed, 27 Mar 2013 21:32:34 +0000 (UTC) Received: (qmail 93529 invoked by uid 500); 27 Mar 2013 21:32:33 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 93402 invoked by uid 500); 27 Mar 2013 21:32:33 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 93394 invoked by uid 99); 27 Mar 2013 21:32:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Mar 2013 21:32:33 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of psdc1978@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Mar 2013 21:32:27 +0000 Received: by mail-wi0-f176.google.com with SMTP id hm14so2750451wib.9 for ; Wed, 27 Mar 2013 14:32:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=EEEr6CiGrVn2k8Uc/CiTp08FPmBDPwbUd8QA4rk3v1o=; b=aBDp2BbTqqo3zZbakYzCqsGN96NN7/BpzUzB330S1VnwomHIyBlMbDk8yeQUWUfj3l U+S9RpTrQ/qcuOKv+RCoGJntqxOW2OIL9kZDWyuuKI/IeFGxd2oOU3sJSBWzW7VWN1u8 u1+vtOrU/quLifpPqVDXuqV+fdy4CMWrtFECBcnWD7btVaRvATmztLRxeW5vKeP/SjLI TsxfVXpD5K3FWUJABUy1mok+tJOMl4IKTy+roPj5D0DZIH1iGfDeBFz6ldQFPK311Z0/ F5q81dEHc/mW/JZ7ZZBj7NW5NkmSG8so18zm8P4LYZy1f2MM+oCOp1llG0aexuRaUHYU oeSw== MIME-Version: 1.0 X-Received: by 10.180.73.6 with SMTP id h6mr12296304wiv.27.1364419927143; Wed, 27 Mar 2013 14:32:07 -0700 (PDT) Received: by 10.180.8.69 with HTTP; Wed, 27 Mar 2013 14:32:07 -0700 (PDT) In-Reply-To: References: Date: Wed, 27 Mar 2013 21:32:07 +0000 Message-ID: Subject: Re: FSDataOutputStream hangs in out.close() From: =?ISO-8859-1?Q?Pedro_S=E1_da_Costa?= To: "mapreduce-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=f46d043c7f0414fc3d04d8eec51a X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c7f0414fc3d04d8eec51a Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I just create 2 different FS instances. On Wednesday, 27 March 2013, Harsh J wrote: > Same data does not mean same block IDs across two clusters. I'm > guessing this is cause of some issue in your code when wanting to > write to two different HDFS instances with the same client. Did you do > a low level mod for HDFS writes as well or just create two different > FS instances when you want to write to different ones? > > On Wed, Mar 27, 2013 at 9:34 PM, Pedro S=E1 da Costa > wrote: > > I can add this information taken from the datanode logs, but it seems > > something related to blocks: > > > > nfoPort=3D50075, ipcPort=3D50020):Got exception while serving > > blk_-4664365259588027316_2050 to /XXX.XXX.XXX.123: > > java.io.IOException: Block blk_-4664365259588027316_2050 is not valid. > > at > > > org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.j= ava:1072) > > at > > > org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java= :1035) > > at > > > org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDatas= et.java:1045) > > at > > > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.jav= a:94) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.= java:189) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:9= 9) > > at java.lang.Thread.run(Thread.java:662) > > > > 2013-03-27 15:44:54,965 ERROR > > org.apache.hadoop.hdfs.server.datanode.DataNode: > > DatanodeRegistration(XXX.XXX.XXX.123:50010, > > storageID=3DDS-595468034-XXX.XXX.XXX.123-50010-1364122596021, > infoPort=3D50075, > > ipcPort=3D50020):DataXceiver > > java.io.IOException: Block blk_-4664365259588027316_2050 is not valid. > > at > > > org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.j= ava:1072) > > at > > > org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java= :1035) > > at > > > org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDatas= et.java:1045) > > at > > > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.jav= a:94) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.= java:189) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:9= 9) > > at java.lang.Thread.run(Thread.java:662) > > > > I still have no idea why this error, if the 2 HDFS instances have the > same > > data. > > > > > > On 27 March 2013 15:53, Pedro S=E1 da Costa wrote: > >> > >> Hi, > >> > >> I'm trying to make the same client to talk to different HDFS and JT > >> instances that are in different sites of Amazon EC2. The error that I > got > >> is: > >> > >> java.io.IOException: Got error for OP_READ_BLOCK, > >> self=3D/XXX.XXX.XXX.123:44734, > >> > >> > remote=3Dip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX.123:50= 010, > >> for file > >> > >> > ip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX.123:50010:-4664= 365259588027316, > >> for block > >> -4664365259588027316_2050 > >> > >> This error means than it wasn't possible to write on a remote host? > >> > >> > >> > >> > >> > >> On 27 March 2013 12:24, Harsh J wrote: > >>> > >>> You can try to take a jstack stack trace and see what its hung on. > >>> I've only ever noticed a close() hang when the NN does not accept the > >>> complete-file call (due to minimum replication not being guaranteed), > >>> but given your changes (which I haven't an idea about yet) it could b= e > >>> something else as well. You're essentially trying to make the same > >>> client talk to two different FSes I think (aside of the JT RPC). > >>> > >>> On Wed, Mar 27, 2013 at 5:50 PM, Pedro S=E1 da Costa > > >>> wrote: > >>> > Hi, > >>> > > >>-- > Harsh J > --=20 Best regards, --f46d043c7f0414fc3d04d8eec51a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I just create 2 different FS instances.

On Wednesday, 2= 7 March 2013, Harsh J wrote:
Same data d= oes not mean same block IDs across two clusters. I'm
guessing this is cause of some issue in your code when wanting to
write to two different HDFS instances with the same client. Did you do
a low level mod for HDFS writes as well or just create two different
FS instances when you want to write to different ones?

On Wed, Mar 27, 2013 at 9:34 PM, Pedro S=E1 da Costa <psdc1978@gmail.= com> wrote:
> I can add this information taken from the datanode logs, but it seems<= br> > something related to blocks:
>
> nfoPort=3D50075, ipcPort=3D50020):Got exception while serving
> blk_-4664365259588027316_2050 to /XXX.XXX.XXX.123:
> java.io.IOException: Block blk_-4664365259588027316_2050 is not valid.=
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDatase= t.java:1072)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.j= ava:1035)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDa= taset.java:1045)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockS= ender.java:94)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiv= er.java:189)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav= a:99)
> =A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:662)
>
> 2013-03-27 15:44:54,965 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(XXX.XXX.XXX.123:50010,
> storageID=3DDS-595468034-XXX.XXX.XXX.123-50010-1364122596021, infoPort= =3D50075,
> ipcPort=3D50020):DataXceiver
> java.io.IOException: Block blk_-4664365259588027316_2050 is not valid.=
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDatase= t.java:1072)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.j= ava:1035)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDa= taset.java:1045)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockS= ender.java:94)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiv= er.java:189)
> =A0 =A0 =A0 =A0 at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav= a:99)
> =A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:662)
>
> I still have no idea why this error, if the 2 HDFS instances have the = same
> data.
>
>
> On 27 March 2013 15:53, Pedro S=E1 da Costa <psdc1978@gmail.com<= /a>> wrote:
>>
>> Hi,
>>
>> I'm trying to make the same client to talk to different HDFS a= nd JT
>> instances that are in different sites of Amazon EC2. The error tha= t I got
>> is:
>>
>> =A0java.io.IOException: Got error for OP_READ_BLOCK,
>> self=3D/XXX.XXX.XXX.123:44734,
>>
>> remote=3Dip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX= .123:50010,
>> for file
>>
>> ip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX.123:5001= 0:-4664365259588027316,
>> for block
>> =A0 =A0-4664365259588027316_2050
>>
>> This error means than it wasn't possible to write on a remote = host?
>>
>>
>>
>>
>>
>> On 27 March 2013 12:24, Harsh J <
harsh@cloudera.com> = wrote:
>>>
>>> You can try to take a jstack stack trace and see what its hung= on.
>>> I've only ever noticed a close() hang when the NN does not= accept the
>>> complete-file call (due to minimum replication not being guara= nteed),
>>> but given your changes (which I haven't an idea about yet)= it could be
>>> something else as well. You're essentially trying to make = the same
>>> client talk to two different FSes I think (aside of the JT RPC= ).
>>>
>>> On Wed, Mar 27, 2013 at 5:50 PM, Pedro S=E1 da Costa <ps= dc1978@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>--
Harsh J


--
Best regards,

--f46d043c7f0414fc3d04d8eec51a--