hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly
Date Tue, 21 Mar 2017 00:43:42 GMT

    [ https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933876#comment-15933876

ASF GitHub Bot commented on YARN-6302:

Github user szegedim commented on a diff in the pull request:

    --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ConfigurationException.java
    @@ -0,0 +1,44 @@
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.hadoop.yarn.exceptions;
    +import org.apache.hadoop.classification.InterfaceAudience.Public;
    +import org.apache.hadoop.classification.InterfaceStability.Unstable;
    + * This exception is thrown on unrecoverable container launch errors.
    --- End diff --
    Agreed. Fixed the code.

> Fail the node, if Linux Container Executor is not configured properly
> ---------------------------------------------------------------------
>                 Key: YARN-6302
>                 URL: https://issues.apache.org/jira/browse/YARN-6302
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>            Priority: Minor
> We have a cluster that has one node with misconfigured Linux Container Executor. Every
time an AM or regular container is launched on the cluster, it will fail. The node will still
have resources available, so it keeps failing apps until the administrator notices the issue
and decommissions the node. AM Blacklisting only helps, if the application is already running.
> As a possible improvement, when the LCE is used on the cluster and a NM gets certain
errors back from the LCE, like error 24 configuration not found, we should not try to allocate
anything on the node anymore or shut down the node entirely. That kind of problem normally
does not fix itself and it means that nothing can really run on that node.
> {code}
> Application application_1488920587909_0010 failed 2 times due to AM Container for appattempt_1488920587909_0010_000002
exited with exitCode: -1000
> Failing this attempt.Diagnostics: Application application_1488920587909_0010 initialization
failed (exitCode=24) with output:
> For more detailed output, check the application tracking page: http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010
Then click on links to logs of each attempt.
> . Failing the application.
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message