hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11940) "INSERT OVERWRITE" query is very slow because it creates one "distcp" per file to copy data from staging directory to target directory
Date Sat, 26 Sep 2015 05:35:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909107#comment-14909107
] 

Hive QA commented on HIVE-11940:
--------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12762161/HIVE-11940.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9590 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-orc_vectorization_ppd.q-and-12-more
- did not produce a TEST-*.xml file
TestMiniTezCliDriver-enforce_order.q-constprog_dpp.q-auto_join1.q-and-12-more - did not produce
a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5419/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5419/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5419/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12762161 - PreCommit-HIVE-TRUNK-Build

> "INSERT OVERWRITE" query is very slow because it creates one "distcp" per file to copy
data from staging directory to target directory
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11940
>                 URL: https://issues.apache.org/jira/browse/HIVE-11940
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.1
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-11940.1.patch, HIVE-11940.2.patch
>
>
> When hive.exec.stagingdir is set to ".hive-staging", which will be placed under the target
directory when running "INSERT OVERWRITE" query, Hive will grab all files under the staging
directory and copy them ONE BY ONE to target directory.
> When hive exec.stagingdir is set to "/tmp/hive", Hive will simply do a RENAME operation
which will be instant.
> This happens with files that are not encrypted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message