hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-22416) MR-related operation logs missing when parallel execution is enabled
Date Tue, 29 Oct 2019 03:12:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-22416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961630#comment-16961630
] 

Hive QA commented on HIVE-22416:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12984182/HIVE-22416.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 17546 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/19183/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19183/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19183/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12984182 - PreCommit-HIVE-Build

>  MR-related operation logs missing when parallel execution is enabled
> ---------------------------------------------------------------------
>
>                 Key: HIVE-22416
>                 URL: https://issues.apache.org/jira/browse/HIVE-22416
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>         Attachments: HIVE-22416.01.patch
>
>
> Repro steps:
>  1. Happy path, parallel execution disabled
> {code:java}
> 0: jdbc:hive2://localhost:10000> set hive.exec.parallel=false;
> No rows affected (0.023 seconds)
> 0: jdbc:hive2://localhost:10000> select count  (*) from t1;
> INFO  : Compiling command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d):
select count  (*) from t1
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint,
comment:null)], properties:null)
> INFO  : Completed compiling command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d);
Time taken: 0.309 seconds
> INFO  : Executing command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d):
select count  (*) from t1
> WARN  : 
> INFO  : Query ID = karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Number of reduce tasks determined at compile time: 1
> INFO  : In order to change the average load for a reducer (in bytes):
> INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
> INFO  : In order to limit the maximum number of reducers:
> INFO  :   set hive.exec.reducers.max=<number>
> INFO  : In order to set a constant number of reducers:
> INFO  :   set mapreduce.job.reduces=<number>
> DEBUG : Configuring job job_local495362389_0008 with file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008
as the submit dir
> DEBUG : adding the following namenodes' delegation tokens:[file:///]
> DEBUG : Creating splits at file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008
> INFO  : number of splits:0
> INFO  : Submitting tokens for job: job_local495362389_0008
> INFO  : Executing with tokens: []
> INFO  : The url to track the job: http://localhost:8080/
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2019-10-28 15:26:22,537 Stage-1 map = 0%,  reduce = 100%
> INFO  : Ended Job = job_local495362389_0008
> INFO  : MapReduce Jobs Launched: 
> INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> INFO  : Total MapReduce CPU Time Spent: 0 msec
> INFO  : Completed executing command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d);
Time taken: 6.497 seconds
> INFO  : OK
> DEBUG : Shutting down query select count  (*) from t1
> +-----+
> | c0  |
> +-----+
> | 0   |
> +-----+
> 1 row selected (11.874 seconds)
> {code}
> 2. Faulty path, parallel execution enabled
> {code:java}
> 0: jdbc:hive2://localhost:10000> set hive.server2.logging.operation.level=EXECUTION;
> No rows affected (0.236 seconds)
> 0: jdbc:hive2://localhost:10000> set hive.exec.parallel=true;
> No rows affected (0.01 seconds)
> 0: jdbc:hive2://localhost:10000> select count  (*) from t1;
> INFO  : Compiling command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77):
select count  (*) from t1
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, type:bigint,
comment:null)], properties:null)
> INFO  : Completed compiling command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77);
Time taken: 4.707 seconds
> INFO  : Executing command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77):
select count  (*) from t1
> WARN  : 
> INFO  : Query ID = karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in parallel
> INFO  : MapReduce Jobs Launched: 
> INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> INFO  : Total MapReduce CPU Time Spent: 0 msec
> INFO  : Completed executing command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77);
Time taken: 44.577 seconds
> INFO  : OK
> DEBUG : Shutting down query select count  (*) from t1
> +-----+
> | c0  |
> +-----+
> | 0   |
> +-----+
> 1 row selected (54.665 seconds)
> {code}
> The issue is that Log4J stores the session ID and query ID in some atomic thread metadata
(org.apache.logging.log4j.ThreadContext.getImmutableContext()). If the queryId is missing
from this metadata, then the RoutingAppender (which is defined programmatically in LogDivertAppender)
will route the log to a NullAppender, which logs nothing. If the queryId is present, then
the RoutingAppender routes the event to the "query-appender", which will log the line in the
operation log/Beeline. This is not happening in a multi-threaded context since new threads
created for parallel query execution do not have the queryId/sessionId metadata.
> The solution is to add the queryId/sessionId metadata to any new threads created for
MR work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message