hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam <gautamkows...@gmail.com>
Subject Re: Tez jobs on YARN failing sporadically..
Date Wed, 06 Jul 2016 07:04:54 GMT
We found out what happened here. As suspected this wasn't an issue with
Tez. The job localizer thread on some NMs was crashing with :

2016-07-02 10:20:17,881 ERROR
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Failed to submit rsrc { {
hdfs://master-nn-host:8020/parquet_loader/0052919-160630152347927-oozie-oozi-W/script.q,
1467450680162, FILE, null
},pending,[(container_e25_1467304052008_27086_01_000077)],36144839749319326,FAILED}
for download. Either queue is full or threadpool is
shutdown.java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ExecutorCompletionService$QueueingFuture@921a73e
rejected from java.util.concurrent.ThreadPoolExecutor@3283d190[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
109]



I think we ran into  one of the many localization issues reported here:
https://issues.apache.org/jira/browse/YARN-543

In particular the symptom is that NM fails to spawn the task container due
to init issues. This affected MR and Tez jobs alike. Sometimes even
crashing the AM initialization itself.

*Restarting the affected NMs fixed the issue. *


-Gautam.


On Tue, Jul 5, 2016 at 11:55 PM, Gopal Vijayaraghavan <gopalv@apache.org>
wrote:

>
>
> > when the executor is overwhelmed with tasks or execute() is called while
> >shutting down. I'm confounded as to why this would be an issue suddenly.
>
> > Container container_e23_1466828114374_53316_01_000018 finished with
> >diagnostics set to Container failed, exitCode=-1000. Task
> >java.util.concurrent.ExecutorCompletionService$QueueingFuture@6c5f576
>  rejected from java.util.concurrent.ThreadPoolExecutor@9bf8295
>  Terminated, pool size = 0, active threads = 0, queued tasks = 0,
> completed tasks = 111
>
> As always, this needs more info mostly from the yarn logs -applicationId
> <application>.
>
> It's not entirely clear whether this is happening in the NM or the task
> itself.
>
> The active threads = 0, suggests this might be related to pam_limits
> nproc, causing threads to exit without running.
>
> Did you reboot the system recently?
>
> Cheers,
> Gopal
>
>
>


-- 
"If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers..."

Mime
View raw message