hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 麦树荣 <shurong....@qunar.com>
Subject 答复: 答复: problems of FairScheduler in hadoop2.2.0
Date Thu, 28 Nov 2013 06:11:37 GMT
Hi,

Thanks for your attention.

When jobs cannot run and all the jobs’ status were keeping “submitted” after submitting,
  the scheduler part (the red frame of the picture below )of resourcemanager web UI cann’t
 be opened  and the exception log is as follows in the resourcemanager log:

2013-11-27 14:41:36,414 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI:
/cluster/scheduler
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.FairSchedulerInfo.getAppFairShare(FairSchedulerInfo.java:49)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:97)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:40)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(Hamlet.java:30347)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerPage$QueuesBlock.render(FairSchedulerPage.java:176)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
        at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
        at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
        at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:82)
        ... 40 more

[cid:image001.png@01CEEC42.BE0E06B0]
发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
发送时间: 2013年11月28日 1:20
收件人: user@hadoop.apache.org
主题: Re: 答复: problems of FairScheduler in hadoop2.2.0

Thanks for the additional info.  Still not sure what could be going on.  Do you notice any
other suspicious LOG messages in the resourcemanager log?  Are you able to show the results
of <resourcemanagerwebaddress>/ws/v1/cluster/scheduler?  On the resourcemanager web
UI, how much memory does it say is used?

On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <shurong.mai@qunar.com<mailto:shurong.mai@qunar.com>>
wrote:
Hi,

sorry, I complement some information.

The hadoop 2.2.0 had been running normally for some days since I start up the hadoop server.
I can run jobs  without any problems.
Today suddenly the jobs cannot run and all the jobs’ status were keeping “submitted”
after submitting.
There are 3 slavers and every slave has 32G memory and 24 cpus.

The contents of my fair-scheduler.xml is as follows:

<?xml version="1.0"?>
<allocations>
    <queue name="root">
    <minResources>10000mb,10vcores</minResources>
    <maxResources>90000mb,100vcores</maxResources>
    <maxRunningApps>50</maxRunningApps>
    <weight>2.0</weight>
    <schedulingMode>fair</schedulingMode>
    <aclSubmitApps> </aclSubmitApps>
    <aclAdministerApps> </aclAdministerApps>
        <queue name="queue1">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
        </queue>
        <queue name="queue2">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1 datadev</aclSubmitApps>
        </queue>
        <queue name="queue3">
                <minResources>5000mb,5vcores</minResources>
                <maxResources>10000mb,10vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
        </queue>
        <queue name="default">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
        </queue>
      </queue>
  <user name="xxx">
    <maxRunningApps>10</maxRunningApps>
  </user>
  <userMaxAppsDefault>10</userMaxAppsDefault>
</allocations>

发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com<mailto:sandy.ryza@cloudera.com>]
发送时间: 2013年11月27日 16:33
收件人: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
主题: Re: problems of FairScheduler in hadoop2.2.0

Hi,

Can you share the contents of your fair-scheduler.xml?  If you submit just a single job, does
it run?  What do you see if you go to <resourcemanagerwebui>/ws/v1/cluster/scheduler?

-Sandy

On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <shurong.mai@qunar.com<mailto:shurong.mai@qunar.com>>
wrote:
Hi, all

When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the hadoop resourcemanager
cannot work normally: When I submit jobs and the jobs’ status all are “submitted” and
cannot run.
I cannot find any answers in the internet, who can give me some help? Thanks.

The resourcemanager log is as follows:

2013-11-27 14:39:10,749 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,753 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,754 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001


Mime
View raw message