hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: DFSClient error
Date Fri, 27 Apr 2012 22:45:43 GMT
After all the jobs fail I can't run anything. Once I restart the cluster I
am able to run other jobs with no problems, hadoop fs and other io
intensive jobs run just fine.

On Fri, Apr 27, 2012 at 3:12 PM, John George <johngeo@yahoo-inc.com> wrote:

> Can you run a regular 'hadoop fs' (put orls or get) command?
> If yes, how about a wordcount example?
> '<path>/hadoop jar <path>hadoop-*examples*.jar wordcount input output'
>
>
> -----Original Message-----
> From: Mohit Anchlia <mohitanchlia@gmail.com>
> Reply-To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
> Date: Fri, 27 Apr 2012 14:36:49 -0700
> To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
> Subject: Re: DFSClient error
>
> >I even tried to reduce number of jobs but didn't help. This is what I see:
> >
> >datanode logs:
> >
> >Initializing secure datanode resources
> >Successfully obtained privileged resources (streaming port =
> >ServerSocket[addr=/0.0.0.0,localport=50010] ) (http listener port =
> >sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075])
> >Starting regular datanode initialization
> >26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return value
> >of 143
> >
> >userlogs:
> >
> >2012-04-26 19:35:22,801 WARN
> >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is
> >available
> >2012-04-26 19:35:22,801 INFO
> >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library
> >loaded
> >2012-04-26 19:35:22,808 INFO
> >org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
> >initialized native-zlib library
> >2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
> >connect to /125.18.62.197:50010, add to deadNodes and continue
> >java.io.EOFException
> >        at java.io.DataInputStream.readShort(DataInputStream.java:298)
> >        at
> >org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClien
> >t.java:1664)
> >        at
> >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.j
> >ava:2383)
> >        at
> >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java
> >:2056)
> >        at
> >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170)
> >        at java.io.DataInputStream.read(DataInputStream.java:132)
> >        at
> >org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(Decompr
> >essorStream.java:97)
> >        at
> >org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt
> >ream.java:87)
> >        at
> >org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j
> >ava:75)
> >        at java.io.InputStream.read(InputStream.java:85)
> >        at
> >org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
> >        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
> >        at
> >org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRe
> >cordReader.java:114)
> >        at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109)
> >        at
> >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead
> >er.nextKeyValue(PigRecordReader.java:187)
> >        at
> >org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT
> >ask.java:456)
> >        at
> >org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> >        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> >        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> >        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> >        at java.security.AccessController.doPrivileged(Native Method)
> >        at javax.security.auth.Subject.doAs(Subject.java:396)
> >        at
> >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
> >java:1157)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:264)
> >2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
> >connect to /125.18.62.204:50010, add to deadNodes and continue
> >java.io.EOFException
> >
> >namenode logs:
> >
> >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Job
> >job_201204261140_0244 added successfully for user 'hadoop' to queue
> >'default'
> >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker:
> >Initializing job_201204261140_0244
> >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.AuditLogger:
> >USER=hadoop  IP=125.18.62.196        OPERATION=SUBMIT_JOB
> >TARGET=job_201204261140_0244    RESULT=SUCCESS
> >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobInProgress:
> >Initializing job_201204261140_0244
> >2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Exception
> >in
> >createBlockOutputStream 125.18.62.198:50010 java.io.IOException: Bad
> >connect ack with firstBadLink as 125.18.62.197:50010
> >2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> >block blk_2499580289951080275_22499
> >2012-04-26 16:12:53,582 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
> >datanode 125.18.62.197:50010
> >2012-04-26 16:12:53,594 INFO org.apache.hadoop.mapred.JobInProgress:
> >jobToken generated and stored with users keys in
> >/data/hadoop/mapreduce/job_201204261140_0244/jobToken
> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Input
> >size for job job_201204261140_0244 = 73808305. Number of splits = 1
> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress:
> >tip:task_201204261140_0244_m_000000 has split on node:/default-rack/
> >dsdb4.corp.intuit.net
> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress:
> >tip:task_201204261140_0244_m_000000 has split on node:/default-rack/
>  >dsdb5.corp.intuit.net
> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress:
> >job_201204261140_0244 LOCALITY_WAIT_FACTOR=0.4
> >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Job
> >job_201204261140_0244 initialized successfully with 1 map tasks and 0
> >reduce tasks.
> >
> >On Fri, Apr 27, 2012 at 7:50 AM, Mohit Anchlia
> ><mohitanchlia@gmail.com>wrote:
> >
> >>
> >>
> >>  On Thu, Apr 26, 2012 at 10:24 PM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >>> Is only the same IP printed in all such messages? Can you check the DN
> >>> log in that machine to see if it reports any form of issues?
> >>>
> >>> All IPs were logged with this message
> >>
> >>
> >>> Also, did your jobs fail or kept going despite these hiccups? I notice
> >>> you're threading your clients though (?), but I can't tell if that may
> >>> cause this without further information.
> >>>
> >>> It started with this error message and slowly all the jobs died with
> >> "shortRead" errors.
> >> I am not sure about threading. I am using pig script to read .gz file
> >>
> >>
> >>> On Fri, Apr 27, 2012 at 5:19 AM, Mohit Anchlia <mohitanchlia@gmail.com
> >
> >>> wrote:
> >>> > I had 20 mappers in parallel reading 20 gz files and each file around
> >>> > 30-40MB data over 5 hadoop nodes and then writing to the analytics
> >>> > database. Almost midway it started to get this error:
> >>> >
> >>> >
> >>> > 2012-04-26 16:13:53,723 [Thread-8] INFO
> >>> org.apache.hadoop.hdfs.DFSClient -
> >>> > Exception in createBlockOutputStream
> >>> > 17.18.62.192:50010java.io.IOException: Bad connect ack with
> >>>  > firstBadLink as
> >>> > 17.18.62.191:50010
> >>> >
> >>> > I am trying to look at the logs but doesn't say much. What could be
> >>>the
> >>> > reason? We are in pretty closed reliable network and all machines are
> >>> up.
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message