hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3958) support partial scan for analyze command - RCFile
Date Thu, 04 Apr 2013 23:40:55 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623043#comment-13623043
] 

Hudson commented on HIVE-3958:
------------------------------

Integrated in Hive-trunk-hadoop2 #138 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/138/])
    HIVE-3958 support partial scan for analyze command - RCFile
(Gang Tim Liu via namit) (Revision 1461586)

     Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1461586
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanMapper.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanWork.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java
* /hive/trunk/ql/src/test/queries/clientnegative/stats_partialscan_autogether.q
* /hive/trunk/ql/src/test/queries/clientnegative/stats_partialscan_non_external.q
* /hive/trunk/ql/src/test/queries/clientnegative/stats_partialscan_non_native.q
* /hive/trunk/ql/src/test/queries/clientnegative/stats_partscan_norcfile.q
* /hive/trunk/ql/src/test/queries/clientpositive/stats_partscan_1.q
* /hive/trunk/ql/src/test/results/clientnegative/stats_partialscan_autogether.q.out
* /hive/trunk/ql/src/test/results/clientnegative/stats_partialscan_non_external.q.out
* /hive/trunk/ql/src/test/results/clientnegative/stats_partialscan_non_native.q.out
* /hive/trunk/ql/src/test/results/clientnegative/stats_partscan_norcfile.q.out
* /hive/trunk/ql/src/test/results/clientpositive/stats_partscan_1.q.out

                
> support partial scan for analyze command - RCFile
> -------------------------------------------------
>
>                 Key: HIVE-3958
>                 URL: https://issues.apache.org/jira/browse/HIVE-3958
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Gang Tim Liu
>            Assignee: Gang Tim Liu
>             Fix For: 0.11.0
>
>         Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3, HIVE-3958.patch.4,
HIVE-3958.patch.5, HIVE-3958.patch.6
>
>
> analyze commands allows us to collect statistics on existing tables/partitions. It works
great but might be slow since it scans all files.
> There are 2 ways to speed it up:
> 1. collect stats without file scan. It may not collect all stats but good and fast enough
for use case. HIVE-3917 addresses it
> 2. collect stats via partial file scan. It doesn't scan all content of files but part
of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html
for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase
> This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message