hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Leidle (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-13321) Add support for different output strategies
Date Mon, 21 Mar 2016 19:26:25 GMT
Rob Leidle created HIVE-13321:
---------------------------------

             Summary: Add support for different output strategies
                 Key: HIVE-13321
                 URL: https://issues.apache.org/jira/browse/HIVE-13321
             Project: Hive
          Issue Type: Improvement
            Reporter: Rob Leidle


The Hadoop ecosystem has expanded to support a wider variety of data-stores and filesystems
than simply HDFS. These FileSystems have different write atomicity and read consistency guarantees.
 There are enhancements we can make to Hive to ensure Hive works even better with a wider
variety of FileSystems in the Hadoop ecosystem. We can see work going on in the Hadoop project
to robustly support these FileSystems. One such example is HADOOP-9565 where the behavior
of MapReduce output is enhanced to do what is optimal for different FileSystems.
 
A common pattern in MapReduce and Hive is to write all output into a temporary folder and
then rename this temporary folder to match the final output location. When using some of the
newer FileSystems with Hive, the performance can be improved by directly writing output and
avoiding the temporary folder write & rename.
 
The proposal is to enhance Hive to support different strategies for file output. One such
strategy would be a concept named “DirectWrite”. DirectWrite will be optionally enabled,
likely on a per-FileSystem basis. When DirectWrite is enabled, all Hive job output will be
written directly to the output location.
 
This is an umbrella JIRA for all the tasks related to this functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message