hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files
Date Mon, 22 Jun 2015 23:30:01 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596843#comment-14596843
] 

Gopal V commented on HIVE-11043:
--------------------------------

bq. 3) ... In which case we will end up using BI as default even though there are only small
number of files.
bq. 5) Should we make this independently configurable? Instead of using the cache max size.

The max cache size is a safety limit for huge clusters, it is not a configuration requirement.

If you need to change the behaviour explicitly, the right config to change is the strategy
used (between ETL/BI) to select whichever one's the preferred one.

> ORC split strategies should adapt based on number of files
> ----------------------------------------------------------
>
>                 Key: HIVE-11043
>                 URL: https://issues.apache.org/jira/browse/HIVE-11043
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Gopal V
>             Fix For: 2.0.0
>
>         Attachments: HIVE-11043.1.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average file size.
It would be beneficial to choose a different strategy based on number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message