hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pengcheng Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12882) Automatically choose to use noscan for stats collection
Date Fri, 15 Jan 2016 23:00:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102636#comment-15102636
] 

Pengcheng Xiong commented on HIVE-12882:
----------------------------------------

Why RCFile needs all 3 modes? This is something that I do not understand. :) They achieve
the same purpose and noscan obviously outperforms the other two without any down side. Then
why we need to keep partialscan and fullscan modes? Thanks.

> Automatically choose to use noscan for stats collection
> -------------------------------------------------------
>
>                 Key: HIVE-12882
>                 URL: https://issues.apache.org/jira/browse/HIVE-12882
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Pengcheng Xiong
>
> noscan is leveraging the file system to derive the #rows and rawDataSize. According to
[~ashutoshc], it now only works with RC and ORC file type. We would like Hive to automatically
choose to use noscan or scan based on the file system when stats task starts or when user
issues the same query "Analyze ...."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message