hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21915) Hive with TEZ UNION ALL and UDTF results in data loss
Date Mon, 24 Jun 2019 18:52:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871686#comment-16871686
] 

Hive QA commented on HIVE-21915:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12972724/HIVE-21915.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 16339 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.TestDFSErrorHandling.testAccessDenied (batchId=272)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17710/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17710/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17710/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12972724 - PreCommit-HIVE-Build

> Hive with TEZ UNION ALL and UDTF results in data loss
> -----------------------------------------------------
>
>                 Key: HIVE-21915
>                 URL: https://issues.apache.org/jira/browse/HIVE-21915
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 1.2.1
>            Reporter: Wei Zhang
>            Assignee: Wei Zhang
>            Priority: Major
>         Attachments: HIVE-21915.01.patch
>
>
> The HQL syntax is like this:
> CREATE TEMPORARY TABLE tez_union_all_loss_data AS
> SELECT xxx, yyy, zzz,1 as tag
> FROM ods_1
> UNION ALL
> SELECT xxx, yyy, zzz, tag
> FROM
> (
> SELECT xxx
> ,get_json_object(get_json_object(tb,'$.a'),'$.b') AS yyy
> ,zzz
> ,2 as tag
> FROM ods_2
> LATERAL VIEW EXPLODE(some_udf(uuu)) team_number AS tb
> ) tbl 
> ;
>  
> With above HQL, we are expecting that rows with both tag = 2 and tag = 1 appear. In our
case however, all the rows with tag = 1 are lost.
> Dig deeper we can find that the generated two maps have identical task tmp paths. And
that results from when UDTF is present, the FileSinkOperator will be processed twice generating
the tmp path in GenTezUtils.removeUnionOperators();
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message