hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Commented] (HIVE-16870) Give Hive the ability to suppress output of empty files
Date Fri, 09 Jun 2017 18:14:18 GMT


Ashutosh Chauhan commented on HIVE-16870:

Dupe of HIVE-13040 ?

> Give Hive the ability to suppress output of empty files
> -------------------------------------------------------
>                 Key: HIVE-16870
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: StorageHandler
>            Reporter: Stephen Measmer
> Today some hive queries using joins can output zero byte files, particularly on large
joins.  This can have a negative affect on HDFS as it can lead to too many small files [1].
> A solution suggested in this Cloudera Community thread [2] suggests using OutputFormat
of LazyOutputFormat because MapReduce can be set to suppress the generation of empty (zero
byte) files.
> But it's not possible to create a table with an OutputFormat of just LazyOutputFormat
in Hive.  Below is what we found when testing. 
> create table mytable (fip int, state string, zip string, level int) STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat';
> ------------
> Error: Error while compiling statement: FAILED: SemanticException [Error 10055]: Output
Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat
or SequenceFileOutputFormat (state=42000,code=10055)
> [1]
> [2]

This message was sent by Atlassian JIRA

View raw message