hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-18743) CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
Date Thu, 19 Apr 2018 18:40:01 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444587#comment-16444587
] 

Hive QA commented on HIVE-18743:
--------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  1s{color} | {color:blue}
Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m  0s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 47s{color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 23s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  2s{color} |
{color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 57s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 46s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 46s{color} | {color:green}
the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 22s{color} | {color:red}
standalone-metastore: The patch generated 5 new + 522 unchanged - 12 fixed = 527 total (was
534) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  5s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 14s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 58s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03)
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-10349/dev-support/hive-personality.sh
|
| git revision | master / 046bc64 |
| Default Java | 1.8.0_111 |
| checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-10349/yetus/diff-checkstyle-standalone-metastore.txt
|
| modules | C: standalone-metastore U: standalone-metastore |
| Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-10349/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-18743
>                 URL: https://issues.apache.org/jira/browse/HIVE-18743
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 1.2.0, 1.1.0, 2.0.2, 3.0.0
>            Reporter: Alexander Behm
>            Assignee: Alexander Kolbasov
>            Priority: Major
>         Attachments: HIVE-18743.01.patch
>
>
> When hive.stats.autogather=true then the Metastore lists all files under the table directory
to populate basic stats like file counts and sizes. This file listing operation can be very
expensive particularly on filesystems like S3.
> One way to address this issue is to reconfigure hive.stats.autogather=false.
> *Here's the bug*
> It is my understanding that the DO_NOT_UPDATE_STATS table property is intended to selectively
prevent this stats collection. Unfortunately, this table property is checked *after* the expensive
file listing operation, so the DO_NOT_UPDATE_STATS does not seem to work as intended. See:
> https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633
> Relevant code snippet:
> {code}
>   public static boolean updateTableStatsFast(Database db, Table tbl, Warehouse wh,
>                                              boolean madeDir, boolean forceRecompute,
EnvironmentContext environmentContext) throws MetaException {
>     if (tbl.getPartitionKeysSize() == 0) {
>       // Update stats only when unpartitioned
>       FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, tbl);
>       return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, environmentContext);
<--- DO_NOT_UPDATE_STATS is checked in here after wh.getFileStatusesForUnpartitionedTable()
has already been called
>     } else {
>       return false;
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message