hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (Jira)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped
Date Wed, 16 Jun 2021 12:19:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364279#comment-17364279
] 

Hadoop QA commented on MAPREDUCE-7353:
--------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 10s{color} | {color:blue}{color}
| {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  0s{color} |
{color:green}{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green}{color} | {color:green} The patch does not contain any @author tags. {color}
|
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} | {color:red}{color}
| {color:red} The patch doesn't appear to include any new or modified tests. Please justify
why no new tests are needed for this patch. Also please list what manual steps were performed
to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 45s{color}
| {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 39s{color} |
{color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
{color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 36s{color} |
{color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 35s{color}
| {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 37s{color} |
{color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 16s{color}
| {color:green}{color} | {color:green} branch has no errors when building and testing our
client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 32s{color} |
{color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 31s{color} |
{color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
{color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 16m 17s{color} | {color:blue}{color}
| {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  0m 58s{color} |
{color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 32s{color}
| {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 30s{color} |
{color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
{color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 30s{color} | {color:green}{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 27s{color} |
{color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
{color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 27s{color} | {color:green}{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 27s{color}
| {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 30s{color} |
{color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green}{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 45s{color}
| {color:green}{color} | {color:green} patch has no errors when building and testing our client
artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 26s{color} |
{color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 24s{color} |
{color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
{color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m  6s{color} |
{color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 17s{color} | {color:green}{color}
| {color:green} hadoop-mapreduce-client-app in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 30s{color}
| {color:green}{color} | {color:green} The patch does not generate ASF License warnings. {color}
|
| {color:black}{color} | {color:black} {color} | {color:black} 79m 30s{color} | {color:black}{color}
| {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/PreCommit-MAPREDUCE-Build/80/artifact/out/Dockerfile
|
| JIRA Issue | MAPREDUCE-7353 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13026912/MAPREDUCE-7353.001.patch
|
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient
findbugs checkstyle spotbugs |
| uname | Linux 219b68df36c3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 2b304ad6457 |
| Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
|  Test Results | https://ci-hadoop.apache.org/job/PreCommit-MAPREDUCE-Build/80/testReport/
|
| Max. process+thread count | 697 (vs. ulimit of 5500) |
| modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app |
| Console output | https://ci-hadoop.apache.org/job/PreCommit-MAPREDUCE-Build/80/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Mapreduce job fails when NM is stopped
> --------------------------------------
>
>                 Key: MAPREDUCE-7353
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Bilwa S T
>            Assignee: Bilwa S T
>            Priority: Major
>         Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | Processing the event
EventType: CONTAINER_REMOTE_CLEANUP for container container_e03_1622107691213_1054_01_000005
taskAttempt attempt_1622107691213_1054_m_000000_0 | ContainerLauncherImpl.java:394
> 	Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | KILLING attempt_1622107691213_1054_m_000000_0
| ContainerLauncherImpl.java:209
> 	Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event handler | TaskAttempt
killed because it ran on unusable node node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_000000_0
| JobImpl.java:1401
> 	Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event handler | Processing
attempt_1622107691213_1054_m_000000_0 of type TA_KILL | TaskAttemptImpl.java:1390
> 	Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator | Killing taskAttempt:attempt_1622107691213_1054_m_000000_0
because it is running on unusable node:node-group-1ZYEq0002:26009 | RMContainerAllocator.java:1066
> 	Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event handler | Processing
attempt_1622107691213_1054_m_000000_0 of type TA_KILL | TaskAttemptImpl.java:1390
> 	Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event handler | Processing
attempt_1622107691213_1054_m_000000_0 of type TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
> 	Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event handler | Diagnostics
report from attempt_1622107691213_1054_m_000000_0: Container released on a *lost* node | TaskAttemptImpl.java:2649
> 	Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event handler | Processing
attempt_1622107691213_1054_m_000000_0 of type TA_KILL | TaskAttemptImpl.java:1390
> 	Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event handler | Too many
fetch-failures for output of task attempt: attempt_1622107691213_1054_m_000000_0 ... raising
fetch failure to map | JobImpl.java:2005
> 	Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event handler | Processing
attempt_1622107691213_1054_m_000000_0 of type TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
> 	Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event handler | attempt_1622107691213_1054_m_000000_0
transitioned from state SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE
and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
> 	Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event handler | Processing
attempt_1622107691213_1054_m_000000_0 of type TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
> 	Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event handler | Diagnostics
report from attempt_1622107691213_1054_m_000000_0: cleanup failed for container container_e03_1622107691213_1054_01_000005
: java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to node-group-1ZYEq0002:26009
failed on connection exception: java.net.ConnectException: Connection refused; For more details
see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event handler | Processing
attempt_1622107691213_1054_m_000000_0 of type TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
> 	Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event handler | Processing
attempt_1622107691213_1054_m_000000_0 of type TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
> 	Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 going to fetch from
node-group-1ZYEq0002:26008 for: [attempt_1622107691213_1054_m_000000_0] | Fetcher.java:318
> 	Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL for node-group-1ZYEq0002:26008
-> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054&reduce=4&map=attempt_1622107691213_1054_m_000000_0
| Fetcher.java:686
> 	Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | Reporting fetch failure for
attempt_1622107691213_1054_m_000000_0 to MRAppMaster. | ShuffleSchedulerImpl.java:349
> {code}
> As we can see from logs that RM reported AM about node update at 16:26:34 but event was
skipped as KILL event is ignored when TaskAttemptImpl is in SUCCESS_CONTAINER_CLEANUP state.
So next we receive TA_TOO_MANY_FETCH_FAILURE event which will lead to task fail. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message