hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-14828) Cloud/S3: Stats publishing should be on HDFS instead of S3
Date Fri, 23 Sep 2016 07:13:20 GMT
Rajesh Balamohan created HIVE-14828:
---------------------------------------

             Summary: Cloud/S3: Stats publishing should be on HDFS instead of S3
                 Key: HIVE-14828
                 URL: https://issues.apache.org/jira/browse/HIVE-14828
             Project: Hive
          Issue Type: Improvement
          Components: Statistics
            Reporter: Rajesh Balamohan
            Assignee: Rajesh Balamohan
            Priority: Minor


Currently, stats files are created in S3. Later as a part of FSStatsAggregator, it reads this
file and populates MS again.

{noformat}
2016-09-23 05:57:46,772 INFO  [main]: fs.FSStatsPublisher (FSStatsPublisher.java:init(49))
- created : s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001
2016-09-23 05:57:46,773 DEBUG [main]: fs.FSStatsAggregator (FSStatsAggregator.java:connect(53))
- About to read stats from : s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001
{noformat}

Instead of this, stats can be written directly on to HDFS and read locally instead of S3,
which would help in reducing couple of calls to S3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message