hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5562) Provide stripe level column statistics in ORC
Date Sat, 19 Oct 2013 21:21:42 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800002#comment-13800002
] 

Hive QA commented on HIVE-5562:
-------------------------------



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12608689/HIVE-5562.1.patch.txt

{color:green}SUCCESS:{color} +1 4428 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1171/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1171/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

> Provide stripe level column statistics in ORC
> ---------------------------------------------
>
>                 Key: HIVE-5562
>                 URL: https://issues.apache.org/jira/browse/HIVE-5562
>             Project: Hive
>          Issue Type: New Feature
>          Components: File Formats
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5562.1.patch.txt
>
>
> ORC maintains two levels of column statistics. Index statistics (for every rowgroup)
and file level column statistics for the entire file. It is useful to have stripe level column
statistics which will be intermediate to index and file statistics. The reason to maintain
stripe level statistics is that, the current input split computation logic is based on stripe
boundaries. So if stripe level statistics are available and if a stripe doesn't satisfy a
predicate condition then that entire stripe (also split) can be eliminated from split computation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message