hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankit Malhotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4175) Injection of emptyFile into input splits for empty partitions causes Deserializer to fail
Date Wed, 16 Oct 2013 21:40:44 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797293#comment-13797293
] 

Ankit Malhotra commented on HIVE-4175:
--------------------------------------

I'm on hive 0.10. Correct me if I'm wrong but HIVE-3833 might fix this? 

Unfortunately, we're on CDH-4.2.0 and hive 0.10 and cant see upgrading any time soon. I'm
open to any workarounds, the most extreme being, not having empty partitions at all.

> Injection of emptyFile into input splits for empty partitions causes Deserializer to
fail
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-4175
>                 URL: https://issues.apache.org/jira/browse/HIVE-4175
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: CDH4.2, using MR1
>            Reporter: James Kebinger
>            Priority: Minor
>
> My deserializer is expecting to receive one of 2 different subclasses of Writable, but
in certain circumstances it receives an empty instance of org.apache.hadoop.io.Text. This
only happens for task attempts where I observe the file called "emptyFile" in the list of
input splits. 
> I'm doing queries over an external year/month/day partitioned table that have eagerly
created partitions for, so as of today for example, I may do a query where year = 2013 and
month = 3 which includes empty partitions.
> In the course of investigation I downloaded the sequence files to confirm they were ok.
Once I realized that processing of empty partitions was to blame, I am able to work around
the issue by bounding my queries to populated partitions.
> Can the need for the emptyFile be eliminated in the case where there's already a bunch
of splits being processed? Failing that, can the mapper detect the current input is from emptyFile
and not call the deserializer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message