hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1575) Public localizer crashes with "Localized unkown resource"
Date Wed, 08 Jan 2014 22:04:51 GMT

    [ https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865956#comment-13865956
] 

Jason Lowe commented on YARN-1575:
----------------------------------

I think there's a race condition in the public localizer.  The code adds requests to the queue
like this:

{code}
     if (rsrc.tryAcquire()) {
     ....
            pending.put(queue.submit(new FSDownload(lfs, null, conf,
              publicDirDestPath, resource)), request);
{code}

and it pulls requests like this:

{code}
        while (!Thread.currentThread().isInterrupted()) {
          try {
            Future<Path> completed = queue.take();
            LocalizerResourceRequestEvent assoc = pending.remove(completed);
            try {
              Path local = completed.get();
              if (null == assoc) {
                LOG.error("Localized unkonwn resource to " + completed);
{code}

{{pending}} is a ConcurrentHashMap but that's insufficient.  queue.submit can complete and
trigger the consumer thread before the producer thread completes the subsequent pending.put,
and the consumer thread can be left with a request that has no corresponding pending entry.

> Public localizer crashes with "Localized unkown resource"
> ---------------------------------------------------------
>
>                 Key: YARN-1575
>                 URL: https://issues.apache.org/jira/browse/YARN-1575
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.10, 2.2.0
>            Reporter: Jason Lowe
>            Priority: Critical
>
> The public localizer can crash with the error:
> {noformat}
> 2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
> 2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Public cache exiting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message