hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4868) When reading an ORC file by an MR job, some Mappers may not be able to process data in some cases
Date Wed, 11 Sep 2013 17:12:51 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764500#comment-13764500
] 

Yin Huai commented on HIVE-4868:
--------------------------------

HIVE-5102 will address this issue.
                
> When reading an ORC file by an MR job, some Mappers may not be able to process data in
some cases
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4868
>                 URL: https://issues.apache.org/jira/browse/HIVE-4868
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> Let's say a stripe of an ORC file is 256 MB and we set the split size for an MR job to
64 MB. Right now, splits are created based on byte ranges. 
> Here is an example:
> {code}
> |<-The start of a stripe                |<-The end of a stripe
> v                                       v
> |---------------------------------------|
>    ^                        ^ 
>    |<- The start of a split |<- The end of a split
> {\code}
> So, for some Mappers, it is possible that there is no start of a stripe within the byte
range of a split. Those Mappers will process 0 record. We can improve how splits are created
for ORC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message