Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 42B6BD2D6 for ; Tue, 4 Dec 2012 19:55:22 +0000 (UTC) Received: (qmail 93919 invoked by uid 500); 4 Dec 2012 19:55:08 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 93800 invoked by uid 500); 4 Dec 2012 19:55:08 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 93771 invoked by uid 99); 4 Dec 2012 19:55:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 19:55:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rmolina@hortonworks.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vc0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 19:55:01 +0000 Received: by mail-vc0-f176.google.com with SMTP id fo13so3096454vcb.35 for ; Tue, 04 Dec 2012 11:54:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=t7vzdhn9TkPA3i16X+Nuca5//4GfHyLT8mVE8GTaynI=; b=SsEwkg0sGuSvkGKV8K9zNkb7IaP7mn2C4M5y0/DyrFnOORBHMidlV/ji9Kg68Gjtyy b2zGAoDVBWi+jLRsijUb19MA3ro34AP5gWNqO1Qzr5qLITP8rO41h60hUJY1RIH67hqL fg6y9blc5SP7peNA+ozp3gdoxqOWa/o0aNOSMFVTDS3zg2jQ7IU/kkzlY4o7LETkhf6E 5gUWjDidZGzxFQQB5qAo7p/n1eXBNJW2HY8yTV550xMIKopCAiC1GKbC9B7B9RBmEg/S Z1fxZxpFn6vBsCbVT6UE7UD/M9kJBIJnYp41vVFCA5LoC4hnX1so7n9oLH0v4I+wPgKa iiVQ== MIME-Version: 1.0 Received: by 10.52.66.144 with SMTP id f16mr10895657vdt.60.1354650880293; Tue, 04 Dec 2012 11:54:40 -0800 (PST) Received: by 10.58.7.129 with HTTP; Tue, 4 Dec 2012 11:54:40 -0800 (PST) In-Reply-To: References: <7A1E76C6-F020-4174-B617-F7D022B78BE2@gmail.com> Date: Tue, 4 Dec 2012 11:54:40 -0800 Message-ID: Subject: Re: Socket timeout for BlockReaderLocal From: Robert Molina To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf307f309083bbeb04d00c3cac X-Gm-Message-State: ALoCoQlFjsCw5d9YJWuf7jz+ArqbjrraJ+6cnDpy1U+xlYylvn7wvWYPK5Qj7RKf9Dbmms6Sx/uN X-Virus-Checked: Checked by ClamAV on apache.org --20cf307f309083bbeb04d00c3cac Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable Hi Haitao, To help isolate, what happens if you run a different job? Also, if you view the namenode webui or the specific datanode webui having the issue, are there any indicators of it being down? Regards, Robert On Tue, Dec 4, 2012 at 12:49 AM, panfei wrote: > I noticed that you are using jdk 1.7 , personally I prefer 1.6.x ; > if your firewall is OK, you can check you RPC service to see if it is als= o > OK; and test it by telnet 10.130.110.80 50020; > I suggested hive because HQL(SQL-like) is familiar to most people, and th= e > learning curve is smooth; > > > 2012/12/4 Haitao Yao > >> The firewall is OK. >> Well, personally I prefer Pig. And it's a big project, switching pig to >> hive is not an easy way. >> thanks. >> >> Haitao Yao >> yao.erix@gmail.com >> weibo: @haitao_yao >> Skype: haitao.yao.final >> >> On 2012-12-4, at =CF=C2=CE=E73:14, panfei wrote: >> >> check your firewall settings plz. and why not use hive to do work ? >> >> >> 2012/12/4 Haitao Yao >> >>> hi, all >>> I's using Hadoop 1.2.0 , java version "1.7.0_05" >>> When running my pig script , the worker always report this error, and >>> the MR jobs run very slow. >>> Increase the dfs.socket.timeout value does not work. the network is ok, >>> telnet to 50020 port is always ok. >>> here's the stacktrace: >>> >>> 2012-12-04 14:29:41,323 INFO org.apache.hadoop.hdfs.DFSClient: Failed t= o read blk_-2337696885631113108_11054058 on local machinejava.net.SocketTim= eoutException: Call to /10.130.110.80:50020 failed on socket timeout except= ion: java.net.SocketTimeoutException: 10000 millis timeout while waiting fo= r channel to be ready for read. ch : java.nio.channels.SocketChannel[connec= ted local=3D/10.130.110.80:57689 remote=3D/10.130.110.80:50020] >>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1140) >>> at org.apache.hadoop.ipc.Client.call(Client.java:1112) >>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) >>> at $Proxy3.getProtocolVersion(Unknown Source) >>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411) >>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:392) >>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:374) >>> at org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(= DFSClient.java:212) >>> at org.apache.hadoop.hdfs.BlockReaderLocal$LocalDatanodeInfo.getDatano= deProxy(BlockReaderLocal.java:90) >>> at org.apache.hadoop.hdfs.BlockReaderLocal$LocalDatanodeInfo.access$20= 0(BlockReaderLocal.java:65) >>> at org.apache.hadoop.hdfs.BlockReaderLocal.getBlockPathInfo(BlockReade= rLocal.java:224) >>> at org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderL= ocal.java:145) >>> at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java= :509) >>> at org.apache.hadoop.hdfs.DFSClient.access$800(DFSClient.java:78) >>> at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClie= nt.java:2231) >>> at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java= :2384) >>> at java.io.DataInputStream.read(DataInputStream.java:149) >>> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) >>> at java.io.BufferedInputStream.read(BufferedInputStream.java:254) >>> at org.apache.pig.impl.io.BufferedPositionedInputStream.read(BufferedP= ositionedInputStream.java:52) >>> at org.apache.pig.impl.io.InterRecordReader.nextKeyValue(InterRecordRe= ader.java:86) >>> at org.apache.pig.impl.io.InterStorage.getNext(InterStorage.java:77) >>> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRec= ordReader.nextKeyValue(PigRecordReader.java:187) >>> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyVal= ue(MapTask.java:533) >>> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java= :67) >>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfor= mation.java:1136) >>> at org.apache.hadoop.mapred.Child.main(Child.java:249) >>> Caused by: java.net.SocketTimeoutException: 10000 millis timeout while = waiting for channel to be ready for read. ch : java.nio.channels.SocketChan= nel[connected local=3D/10.130.110.80:57689 remote=3D/10.130.110.80:50020] >>> at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.= java:164) >>> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java= :155) >>> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java= :128) >>> at java.io.FilterInputStream.read(FilterInputStream.java:133) >>> at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client= .java:361) >>> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) >>> at java.io.BufferedInputStream.read(BufferedInputStream.java:254) >>> at java.io.DataInputStream.readInt(DataInputStream.java:387) >>> at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java= :841) >>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786) >>> >>> >>> I checked the source code, the exception happens here: >>> //now wait for socket to be ready. >>> *int* count =3D 0; >>> *try* { >>> count =3D *selector*.select(channel, ops, timeout); >>> } *catch* (IOException e) { //unexpected IOException. >>> closed =3D *true*; >>> *throw* e; >>> } >>> >>> *if* (count =3D=3D 0) { >>> //here!! *throw* *new* SocketTimeoutException(* >>> timeoutExceptionString*(channel, >>> timeout= , >>> ops)); >>> } >>> >>> Why the selector selected nothing? the data node is not under heavy loa= d >>> , gc, network are all ok. >>> Thanks. >>> >>> Haitao Yao >>> yao.erix@gmail.com >>> weibo: @haitao_yao >>> Skype: haitao.yao.final >>> >>> >> >> >> -- >> =B2=BB=D1=A7=CF=B0=A3=AC=B2=BB=D6=AA=B5=C0 >> >> >> > > > -- > =B2=BB=D1=A7=CF=B0=A3=AC=B2=BB=D6=AA=B5=C0 > > --20cf307f309083bbeb04d00c3cac Content-Type: text/html; charset=GB2312 Content-Transfer-Encoding: quoted-printable Hi Haitao, 
To help isolate, what happens if you run a different j= ob?  Also, if you view the namenode webui or the specific datanode web= ui having the issue, are there any indicators of it being down?
<= br>
Regards, 
Robert

On T= ue, Dec 4, 2012 at 12:49 AM, panfei <cnweike@gmail.com> wrot= e:
I noticed that you are using jdk 1.7 , personally I prefer 1.6.x ;
if yo= ur firewall is OK, you can check you RPC service to see if it is also OK; a= nd test it by telnet  10.130.110.80 50020;
I suggested hive because= HQL(SQL-like) is familiar to most people, and the learning curve is smooth= ;


2012/12/4 Haitao Yao <yao.erix@gmail.com>
The firewall is OK.  
Well, personally I prefer Pig. And it's a big project, switching pig= to hive is not an easy way.
thanks.

Haitao Yao
weibo: @haitao_yao
<= /div>
Skype:  haitao.yao.final

On 2012-12-4, at =CF=C2=CE=E73:14, panfei <= ;cnweike@gmail.com> wrote:

check your firewall settin= gs plz.  and why not use hive to do work ?


2012/12/4 Hai= tao Yao <yao.erix@gmail.com>
hi,= all
I's using H= adoop 1.2.0 , java version "1.7.0_05"
When running my pig scrip= t ,  the worker always report this error, and the MR jobs run very slo= w. 
Increase th= e dfs.socket.timeout value does not work. the network is ok, telnet to 5002= 0 port is always ok.
here's the stacktrace= : 
2012-12-04 14:29:41,323 INFO org.apache.hadoop.hdfs.=
DFSClient: Failed to read blk_-2337696885631113108_11054058 on local machin=
ejava.net.SocketTimeoutException: Call to /10.130.110.80:50020 failed on socket timeout =
exception: java.net.SocketTimeoutException: 10000 millis timeout while wait=
ing for channel to be ready for read. ch : java.nio.channels.SocketChannel[=
connected local=3D/10.130.110.80:57689 remote=3D/10.130.110.80:50020]
	at org.apache.hadoop.ipc.Client.wrapException(Client.java:1140)
	at org.apache.hadoop.ipc.Client.call(Client.java:1112)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
	at $Proxy3.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:392)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:374)
	at org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(DFSC=
lient.java:212)
	at org.apache.hadoop.hdfs.BlockReaderLocal$LocalDatanodeInfo.getDatanodePr=
oxy(BlockReaderLocal.java:90)
	at org.apache.hadoop.hdfs.BlockReaderLocal$LocalDatanodeInfo.access$200(Bl=
ockReaderLocal.java:65)
	at org.apache.hadoop.hdfs.BlockReaderLocal.getBlockPathInfo(BlockReaderLoc=
al.java:224)
	at org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal=
.java:145)
	at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:509=
)
	at org.apache.hadoop.hdfs.DFSClient.access$800(DFSClient.java:78)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.j=
ava:2231)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:238=
4)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
	at org.apache.pig.impl.io.BufferedPositionedInputStream.read(BufferedPosit=
ionedInputStream.java:52)
	at org.apache.pig.impl.io.InterRecordReader.nextKeyValue(InterRecordReader=
.java:86)
	at org.apache.pig.impl.io.InterStorage.getNext(InterStorage.java:77)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordR=
eader.nextKeyValue(PigRecordReader.java:187)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(M=
apTask.java:533)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1136)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.SocketTimeoutException: 10000 millis timeout while wait=
ing for channel to be ready for read. ch : java.nio.channels.SocketChannel[=
connected local=3D/10.130.110.80:57689 remote=3D/10.130.110.80:50020]
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java=
:164)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155=
)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128=
)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.jav=
a:361)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841=
)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
=
I checked the source code, the exception happens here:=
      //now wait for socket to be ready.
  &n= bsp;   int count =3D 0;
      try {
   = ;     count =3D selector.select(channel, ops, timeout);  
   =   } catch (IOException e)= { //unexpected IOException.
        closed =3D= true;
        throw e;
   =   } 

      if (count = =3D=3D 0) {
//here!! &nbs= p;      throw new SocketTimeoutException(tim= eoutExceptionString(channel,
   =                     &nbs= p;                     &n= bsp;                 timeout, ops));
      }

<= /span>Why the selector selected nothing? the data node is not under heavy l= oad , gc, network are all ok.
Thanks.
<= div>

Haitao Yao
weibo: @haitao_yao
<= /div>
Skype:  haitao.yao.final




-- =B2=BB=D1=A7=CF=B0=A3=AC=B2=BB=D6=AA=B5=C0





-- =
=B2=BB=D1=A7=CF=B0=A3=AC=B2=BB=D6=AA=B5=C0


--20cf307f309083bbeb04d00c3cac--