hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alo alt <wget.n...@googlemail.com>
Subject Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
Date Wed, 01 Feb 2012 08:12:28 GMT
Hi,

+ hdfs-user (bcc'd)

which jre version u use?

- Alex  

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:

> hi ,
> 
> 
> I'm using hive to do some log analysis, and I have encountered a problem.
> 
> My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker
> 
> One of the tasktracker will repeatedly receive KillJobAction and then delete unknown
jobs
> 
> the logs look like:
> 
> 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction'
for job: job_201201301055_0381
> 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381
being deleted.
> 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction'
for job: job_201201301055_0383
> 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383
being deleted.
> 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction'
for job: job_201201301055_0384
> 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384
being deleted.
> 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction'
for job: job_201201301055_0385
> 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385
being deleted.  
> 
> this happens occasionally, and if this happens, this tasktracker will do notghing but
keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down.
> 
> to solve this problem, I have to restart the cluster.
> but obviously, this is not a good solution.
> 
> these jobs eventually will be run on the other tasktracker, and they will run well, the
job will success.
> 
> has anybody have encountered this problem and give me some advices?
> 
> and occasionally there will be some errlog like:
> 
> 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837:
readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes
read: 0
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286
exited. Number of tasks it ran: 0
> 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM
jvm_201201311041_0071_r_-386575334
> 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837:
readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes
read: 0
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
>         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)  
> 
> Is there some connections between these two errors?
> 
> thank you very much!
> 
> xiaobin


Mime
View raw message