Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 11 Sep 2013 17:12:52 +0000 (UTC)
From: "Yin Huai (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12658078.1373999406623.117014.1378919572242@arcas>
In-Reply-To: <JIRA.12658078.1373999406623@arcas>
References: <JIRA.12658078.1373999406623@arcas>
Subject: [jira] [Resolved] (HIVE-4868) When reading an ORC file by an MR
 job, some Mappers may not be able to process data in some cases
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HIVE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yin Huai resolved HIVE-4868.
----------------------------

    Resolution: Duplicate
    
> When reading an ORC file by an MR job, some Mappers may not be able to process data in some cases
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4868
>                 URL: https://issues.apache.org/jira/browse/HIVE-4868
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> Let's say a stripe of an ORC file is 256 MB and we set the split size for an MR job to 64 MB. Right now, splits are created based on byte ranges. 
> Here is an example:
> {code}
> |<-The start of a stripe                |<-The end of a stripe
> v                                       v
> |---------------------------------------|
>    ^                        ^ 
>    |<- The start of a split |<- The end of a split
> {\code}
> So, for some Mappers, it is possible that there is no start of a stripe within the byte range of a split. Those Mappers will process 0 record. We can improve how splits are created for ORC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira