hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alo alt <wget.n...@googlemail.com>
Subject Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
Date Thu, 02 Feb 2012 08:57:17 GMT
A not well written job can easy overload a TaskTracker. The first question is,  why one TT
has no problems and the other has. Take a look at that node in the logs. Did you see messages
like "0 slots free" the handler count could you help.

dfs.namenode.handler.count can be set to 15 or similar. 10 is very moderate.

best,
 Alex  

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 4:11 PM, Xiaobin She wrote:

> hi Alex,
> 
> I did not set the value of dfs.namenode.handler.count in the config file, so it shoule
be the default value, like 10.
> 
> I only have two datanodes, 10 is not enough ? 
> 
> And if it is not enough , why the tasktracker will keep receiveing KillJobAction and
delete unknown job?
> 
> thank you very much for your help!
> 
> 2012/2/1 alo alt <wget.null@googlemail.com>
> How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster?
> 
> - Alex
> 
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
> 
> On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:
> 
> >
> > hi Alex,
> >
> > I'm using jre 1.6.0_24
> >
> > with hadoop 0.20.0
> > hive 0.80
> >
> > thx
> >
> >
> > 2012/2/1 alo alt <wget.null@googlemail.com>
> > Hi,
> >
> > + hdfs-user (bcc'd)
> >
> > which jre version u use?
> >
> > - Alex
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> >
> > On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
> >
> > > hi ,
> > >
> > >
> > > I'm using hive to do some log analysis, and I have encountered a problem.
> > >
> > > My cluster have 3 nodes, one for NameNode/JobTracker and the other two for
DataNode/TaskTracker
> > >
> > > One of the tasktracker will repeatedly receive KillJobAction and then delete
unknown jobs
> > >
> > > the logs look like:
> > >
> > > 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201201301055_0381
> > > 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201201301055_0381 being deleted.
> > > 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201201301055_0383
> > > 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201201301055_0383 being deleted.
> > > 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201201301055_0384
> > > 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201201301055_0384 being deleted.
> > > 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201201301055_0385
> > > 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201201301055_0385 being deleted.
> > >
> > > this happens occasionally, and if this happens, this tasktracker will do notghing
but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop
down.
> > >
> > > to solve this problem, I have to restart the cluster.
> > > but obviously, this is not a good solution.
> > >
> > > these jobs eventually will be run on the other tasktracker, and they will run
well, the job will success.
> > >
> > > has anybody have encountered this problem and give me some advices?
> > >
> > > and occasionally there will be some errlog like:
> > >
> > > 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener
on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count
of bytes read: 0
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > >         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > >         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > > 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286
exited. Number of tasks it ran: 0
> > > 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing
unknown JVM jvm_201201311041_0071_r_-386575334
> > > 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener
on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count
of bytes read: 0
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > >         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > >         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > >
> > > Is there some connections between these two errors?
> > >
> > > thank you very much!
> > >
> > > xiaobin
> >
> >
> 
> 


Mime
View raw message