hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-4868) When reading an ORC file by an MR job, some Mappers may not be able to process data in some cases
Date Tue, 16 Jul 2013 18:30:49 GMT
Yin Huai created HIVE-4868:
------------------------------

             Summary: When reading an ORC file by an MR job, some Mappers may not be able
to process data in some cases
                 Key: HIVE-4868
                 URL: https://issues.apache.org/jira/browse/HIVE-4868
             Project: Hive
          Issue Type: Improvement
            Reporter: Yin Huai


Let's say a stripe of an ORC file is 256 MB and we set the split size for an MR job to 64
MB. Right now, splits are created based on byte ranges. 
Here is an example:
{code}
|<-The start of a stripe                |<-The end of a stripe
v                                       v
|---------------------------------------|
   ^                        ^ 
   |<- The start of a split |<- The end of a split
{\code}

So, for some Mappers, it is possible that there is no start of a stripe within the byte range
of a split. Those Mappers will process 0 record. We can improve how splits are created for
ORC.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message