hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6922) MapReduce jobs may fail during rolling upgrade due to MAPREDUCE-6829
Date Mon, 31 Jul 2017 21:35:00 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated MAPREDUCE-6922:
    Target Version/s: 2.9.0

Looks like FrameworkCounterGroup needs to be updated to protect itself from out-of-bounds
counter indices coming from the task report.  Until that has been rolled out as a baseline
we can't ever safely add new counters without breaking this code (or using a distributed cache
deploy for MapReduce, see below).  Ultimately moving to protocol buffers would be even better,
but that's not something we could accomplish until a major release like Hadoop 3.0.

I should point out that jobs would not be susceptible to this failure if they deployed MapReduce
via HDFS rather than picking it up from the nodes.  See http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html.
 That deployment method has the benefit that a job always runs with a consistent version of
MapReduce, and therefore changes like this _would_ be OK during rolling upgrade.  Jobs would
either run entirely with the old MapReduce version or the new one instead of a hodgepodge
of both which could lead to errors like this.

If distributed cache deploy is not sufficient and this still needs to be reverted then normally
we would simply revert MAPREDUCE-6829 from branch-2, remove 2.9.0 from the fix version of
that JIRA, and mark this as a duplicate of that JIRA.  2.9.0 never shipped, so we don't need
a tracking JIRA to note something was removed since we never shipped a 2.x release saying
it was added.

> MapReduce jobs may fail during rolling upgrade due to MAPREDUCE-6829
> --------------------------------------------------------------------
>                 Key: MAPREDUCE-6922
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6922
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>            Priority: Blocker
>         Attachments: YARN-6922.branch-2.000.patch
> MAPREDUCE-6829 should be reverted from branch-2 because rolling upgrade fails.
> {code}
> 2017-06-08 17:43:37,173 WARN [Socket Reader #1 for port 41187] org.apache.hadoop.ipc.Server:
Unable to read call parameters for client connection protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol
for rpcKind RPC_WRITABLE
> java.lang.ArrayIndexOutOfBoundsException: 23
> 	at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.readFields(FrameworkCounterGroup.java:261)
> 	at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324)
> 	at org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:306)
> 	at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489)
> 	at org.apache.hadoop.mapred.MapTaskStatus.readFields(MapTaskStatus.java:88)
> 	at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
> 	at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
> 	at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1909)
> 	at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1841)
> 	at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1600)
> 	at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:820)
> 	at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:693)
> 	at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:664)
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message