hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaobin She <xiaobin...@gmail.com>
Subject Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
Date Mon, 06 Feb 2012 08:14:10 GMT
hi Alex,

it seems that the reason why that particular failes is because that the
disk space is not enouth in that machine.

Once I clean up some disk space, the problem disappear.

But still I don't understand why.

thx

xiaobin

2012/2/2 alo alt <wget.null@googlemail.com>

> A not well written job can easy overload a TaskTracker. The first question
> is,  why one TT has no problems and the other has. Take a look at that node
> in the logs. Did you see messages like "0 slots free" the handler count
> could you help.
>
> dfs.namenode.handler.count can be set to 15 or similar. 10 is very
> moderate.
>
> best,
>  Alex
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> On Feb 1, 2012, at 4:11 PM, Xiaobin She wrote:
>
> > hi Alex,
> >
> > I did not set the value of dfs.namenode.handler.count in the config
> file, so it shoule be the default value, like 10.
> >
> > I only have two datanodes, 10 is not enough ?
> >
> > And if it is not enough , why the tasktracker will keep receiveing
> KillJobAction and delete unknown job?
> >
> > thank you very much for your help!
> >
> > 2012/2/1 alo alt <wget.null@googlemail.com>
> > How much namenode handler (dfs.namenode.handler.count) you have defined
> for your cluster?
> >
> > - Alex
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> >
> > On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:
> >
> > >
> > > hi Alex,
> > >
> > > I'm using jre 1.6.0_24
> > >
> > > with hadoop 0.20.0
> > > hive 0.80
> > >
> > > thx
> > >
> > >
> > > 2012/2/1 alo alt <wget.null@googlemail.com>
> > > Hi,
> > >
> > > + hdfs-user (bcc'd)
> > >
> > > which jre version u use?
> > >
> > > - Alex
> > >
> > > --
> > > Alexander Lorenz
> > > http://mapredit.blogspot.com
> > >
> > > On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
> > >
> > > > hi ,
> > > >
> > > >
> > > > I'm using hive to do some log analysis, and I have encountered a
> problem.
> > > >
> > > > My cluster have 3 nodes, one for NameNode/JobTracker and the other
> two for DataNode/TaskTracker
> > > >
> > > > One of the tasktracker will repeatedly receive KillJobAction and
> then delete unknown jobs
> > > >
> > > > the logs look like:
> > > >
> > > > 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0381
> > > > 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0381 being deleted.
> > > > 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0383
> > > > 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0383 being deleted.
> > > > 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0384
> > > > 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0384 being deleted.
> > > > 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_201201301055_0385
> > > > 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker:
> Unknown job job_201201301055_0385 being deleted.
> > > >
> > > > this happens occasionally, and if this happens, this tasktracker
> will do notghing but keep receiveing KillJobAction and delete unknown job,
> and thus the performance will drop down.
> > > >
> > > > to solve this problem, I have to restart the cluster.
> > > > but obviously, this is not a good solution.
> > > >
> > > > these jobs eventually will be run on the other tasktracker, and they
> will run well, the job will success.
> > > >
> > > > has anybody have encountered this problem and give me some advices?
> > > >
> > > > and occasionally there will be some errlog like:
> > > >
> > > > 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC
> Server listener on 55837: readAndProcess threw exception
> java.io.IOException: Connection reset by peer. Count of bytes read: 0
> > > > java.io.IOException: Connection reset by peer
> > > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > > >         at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > > >         at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > > >         at
> org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > > > 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager:
> JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
> > > > 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker:
> Killing unknown JVM jvm_201201311041_0071_r_-386575334
> > > > 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC
> Server listener on 55837: readAndProcess threw exception
> java.io.IOException: Connection reset by peer. Count of bytes read: 0
> > > > java.io.IOException: Connection reset by peer
> > > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > > >         at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
> > > >         at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
> > > >         at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
> > > >         at
> org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
> > > >         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
> > > >
> > > > Is there some connections between these two errors?
> > > >
> > > > thank you very much!
> > > >
> > > > xiaobin
> > >
> > >
> >
> >
>
>

Mime
View raw message