hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Singh <smart.rahul.i...@gmail.com>
Subject Re: hadoop inconsistent behaviour
Date Mon, 14 Apr 2014 07:11:14 GMT
I cleaned up the log directory before running the job. Now there is no
nodemanager jobs. when i see in userlogs directory i am getting some syslog
files with the following error:

2014-04-14 11:58:23,472 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 7 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:24,474 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 8 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,475 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2014-04-14 11:58:25,477 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.net.ConnectException: Call From poc-hadoop06/
127.0.1.1 to poc-hadoop06:40221 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at org.apache.hadoop.ipc.Client.call(Client.java:1359)
        at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
        at $Proxy6.getTask(Unknown Source)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:137)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
        at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696)
        at
org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
        at org.apache.hadoop.ipc.Client.call(Client.java:1377)
        ... 4 more

2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask
metrics system...
2014-04-14 11:58:25,478 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system stopped.
2014-04-14 11:58:25,479 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics
system shutdown complete.



On Mon, Apr 14, 2014 at 12:21 PM, Shengjun Xin <sxin@gopivotal.com> wrote:

> You can check the nodemanager log
>
>
> On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh <smart.rahul.iiit@gmail.com>wrote:
>
>> Hi,
>>   I am running a job(wordcount example) on 3 node cluster(1 master and 2
>> slave), some times the job passes but some times it fails(as reduce fails,
>> input data few kbs).
>>   I am not able to nail down the reason of this inconsistency.
>>
>>
>>
>> failed log:
>>
>> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0003
>> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0003
>> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0003/
>> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003
>> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running
>> in uber mode : false
>> 14/04/14 11:57:34 INFO mapreduce.Job:  map 0% reduce 0%
>> 14/04/14 11:57:40 INFO mapreduce.Job:  map 100% reduce 0%
>> 14/04/14 11:57:46 INFO mapreduce.Job:  map 100% reduce 13%
>> 14/04/14 11:57:48 INFO mapreduce.Job:  map 100% reduce 25%
>> 14/04/14 11:57:49 INFO mapreduce.Job:  map 100% reduce 38%
>> 14/04/14 11:57:50 INFO mapreduce.Job:  map 100% reduce 50%
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_0, Status : FAILED
>> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000001_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_0, Status : FAILED
>> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000007_0, Status : FAILED
>> 14/04/14 11:58:02 INFO mapreduce.Job:  map 100% reduce 63%
>> 14/04/14 11:58:04 INFO mapreduce.Job:  map 100% reduce 75%
>> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_1, Status : FAILED
>> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_1, Status : FAILED
>> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000003_2, Status : FAILED
>> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0003_r_000005_2, Status : FAILED
>> 14/04/14 11:58:40 INFO mapreduce.Job:  map 100% reduce 100%
>> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed
>> with state FAILED due to: Task failed task_1397454060494_0003_r_000003
>> Job failed as tasks failed. failedMaps:0 failedReduces:1
>>
>> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51
>>     File System Counters
>>         FILE: Number of bytes read=80
>>         FILE: Number of bytes written=596766
>>         FILE: Number of read operations=0
>>         FILE: Number of large read operations=0
>>         FILE: Number of write operations=0
>>         HDFS: Number of bytes read=175
>>         HDFS: Number of bytes written=28
>>         HDFS: Number of read operations=21
>>         HDFS: Number of large read operations=0
>>         HDFS: Number of write operations=12
>>     Job Counters
>>         Failed reduce tasks=9
>>         Killed reduce tasks=1
>>         Launched map tasks=1
>>         Launched reduce tasks=16
>>         Data-local map tasks=1
>>         Total time spent by all maps in occupied slots (ms)=3477
>>         Total time spent by all reduces in occupied slots (ms)=148867
>>         Total time spent by all map tasks (ms)=3477
>>         Total time spent by all reduce tasks (ms)=148867
>>         Total vcore-seconds taken by all map tasks=3477
>>         Total vcore-seconds taken by all reduce tasks=148867
>>         Total megabyte-seconds taken by all map tasks=3560448
>>         Total megabyte-seconds taken by all reduce tasks=152439808
>>     Map-Reduce Framework
>>         Map input records=3
>>         Map output records=13
>>         Map output bytes=110
>>         Map output materialized bytes=112
>>         Input split bytes=117
>>         Combine input records=13
>>         Combine output records=6
>>         Reduce input groups=4
>>         Reduce shuffle bytes=80
>>         Reduce input records=4
>>         Reduce output records=4
>>         Spilled Records=10
>>         Shuffled Maps =6
>>         Failed Shuffles=0
>>         Merged Map outputs=6
>>         GC time elapsed (ms)=142
>>         CPU time spent (ms)=6420
>>         Physical memory (bytes) snapshot=1100853248
>>         Virtual memory (bytes) snapshot=4468314112
>>         Total committed heap usage (bytes)=1406992384
>>     Shuffle Errors
>>         BAD_ID=0
>>         CONNECTION=0
>>         IO_ERROR=0
>>         WRONG_LENGTH=0
>>         WRONG_MAP=0
>>         WRONG_REDUCE=0
>>     File Input Format Counters
>>         Bytes Read=58
>>     File Output Format Counters
>>         Bytes Written=28
>>
>> Job Passing Logs:
>> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar
>> wordcount /user/hduser/input /user/hduser/output_wordcount9
>> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at /
>> 20.0.1.206:8032
>> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1
>> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397454060494_0002
>> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application
>> application_1397454060494_0002
>> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job:
>> http://20.0.1.206:8088/proxy/application_1397454060494_0002/
>> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002
>> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running
>> in uber mode : false
>> 14/04/14 11:47:36 INFO mapreduce.Job:  map 0% reduce 0%
>> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_0, Status : FAILED
>> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_1, Status : FAILED
>> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_m_000000_2, Status : FAILED
>> 14/04/14 11:48:26 INFO mapreduce.Job:  map 100% reduce 0%
>> 14/04/14 11:48:34 INFO mapreduce.Job:  map 100% reduce 13%
>> 14/04/14 11:48:35 INFO mapreduce.Job:  map 100% reduce 25%
>> 14/04/14 11:48:37 INFO mapreduce.Job:  map 100% reduce 50%
>> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000001_0, Status : FAILED
>> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000003_0, Status : FAILED
>> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_0, Status : FAILED
>> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000007_0, Status : FAILED
>> 14/04/14 11:48:50 INFO mapreduce.Job:  map 100% reduce 63%
>> 14/04/14 11:48:51 INFO mapreduce.Job:  map 100% reduce 75%
>> 14/04/14 11:48:52 INFO mapreduce.Job:  map 100% reduce 88%
>> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id :
>> attempt_1397454060494_0002_r_000005_1, Status : FAILED
>> 14/04/14 11:49:05 INFO mapreduce.Job:  map 100% reduce 100%
>> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002
>> completed successfully
>> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52
>>     File System Counters
>>         FILE: Number of bytes read=112
>>         FILE: Number of bytes written=767175
>>         FILE: Number of read operations=0
>>         FILE: Number of large read operations=0
>>         FILE: Number of write operations=0
>>         HDFS: Number of bytes read=175
>>         HDFS: Number of bytes written=40
>>         HDFS: Number of read operations=27
>>         HDFS: Number of large read operations=0
>>         HDFS: Number of write operations=16
>>     Job Counters
>>         Failed map tasks=3
>>         Failed reduce tasks=5
>>         Launched map tasks=4
>>         Launched reduce tasks=13
>>         Other local map tasks=3
>>         Data-local map tasks=1
>>         Total time spent by all maps in occupied slots (ms)=41629
>>         Total time spent by all reduces in occupied slots (ms)=104530
>>         Total time spent by all map tasks (ms)=41629
>>         Total time spent by all reduce tasks (ms)=104530
>>         Total vcore-seconds taken by all map tasks=41629
>>         Total vcore-seconds taken by all reduce tasks=104530
>>         Total megabyte-seconds taken by all map tasks=42628096
>>         Total megabyte-seconds taken by all reduce tasks=107038720
>>     Map-Reduce Framework
>>         Map input records=3
>>         Map output records=13
>>         Map output bytes=110
>>         Map output materialized bytes=112
>>         Input split bytes=117
>>         Combine input records=13
>>         Combine output records=6
>>         Reduce input groups=6
>>         Reduce shuffle bytes=112
>>         Reduce input records=6
>>         Reduce output records=6
>>         Spilled Records=12
>>         Shuffled Maps =8
>>         Failed Shuffles=0
>>         Merged Map outputs=8
>>         GC time elapsed (ms)=186
>>         CPU time spent (ms)=8890
>>         Physical memory (bytes) snapshot=1408913408
>>         Virtual memory (bytes) snapshot=5727019008
>>         Total committed heap usage (bytes)=1808990208
>>     Shuffle Errors
>>         BAD_ID=0
>>         CONNECTION=0
>>         IO_ERROR=0
>>         WRONG_LENGTH=0
>>         WRONG_MAP=0
>>         WRONG_REDUCE=0
>>     File Input Format Counters
>>         Bytes Read=58
>>     File Output Format Counters
>>         Bytes Written=40
>>
>> Thanks and Regards,
>> -Rahul Singh
>>
>
>
>
> --
> Regards
> Shengjun
>

Mime
View raw message