hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5924) Resource Manager fails to load state with InvalidProtocolBufferException
Date Tue, 22 Nov 2016 16:03:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687123#comment-15687123
] 

ASF GitHub Bot commented on YARN-5924:
--------------------------------------

GitHub user ameks94 opened a pull request:

    https://github.com/apache/hadoop/pull/164

    YARN-5924 - Resource Manager fails to load state with InvalidProtocolBufferException

    The solution is to catch "InvalidProtocolBufferException", show warning and remove application's
folder that contains invalid data to prevent RM restart failure. 
    
    Additionally, I've added catch for other exceptions that can appear during recovering
of the specific application, to avoid RM failure even if the only one application's state
can't be loaded.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ameks94/hadoop YARN-5924

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/164.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #164
    
----
commit 65c7c6696ab3d6a9f69c235b915d4bf30d190b52
Author: Oleksii Dymytrov <ameks94@gmail.com>
Date:   2016-11-22T15:33:54Z

    YARN-5924 - Resource Manager fails to load state with InvalidProtocolBufferException

----


> Resource Manager fails to load state with InvalidProtocolBufferException
> ------------------------------------------------------------------------
>
>                 Key: YARN-5924
>                 URL: https://issues.apache.org/jira/browse/YARN-5924
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Oleksii Dymytrov
>            Assignee: Oleksii Dymytrov
>         Attachments: YARN-5924-branch-3.0.0-alpha1.001.patch
>
>
> InvalidProtocolBufferException is thrown during recovering of the application's state
if application's data has invalid format (or is broken) under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/
directory in HDFS:
> {noformat}
> com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did
not match expected tag.
> 	at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
> 	at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
> 	at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
> 	at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
> {noformat}
> The solution can be to catch "InvalidProtocolBufferException", show warning and remove
application's folder that contains invalid data to prevent RM restart failure. 
> Additionally, I've added catch for other exceptions that can appear during recovering
of the specific application, to avoid RM failure even if the only one application's state
can't be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message