hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Anyway to avoid creating subdirectories by "Insert with unionĀ²
Date Wed, 24 Feb 2016 07:37:41 GMT

>Is there anyway to avoid creating sub-directories while running in tez?
>Or this is by design and can not be changed?

Yes, this is by design. The Tez execution of UNION is entirely parallel &
the task-ids overlaps - so the files created have to have unique names.

But the total counts for "Map 1" and "Map 2" are only available as the job
runs, so they write to different dirs.

Here's a comparison of MapReduce vs Tez (from 2014, some slides are out of
date now).

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/15


This UNION method is faster because of fewer intermediate HDFS writes &
mapreduce.input.fileinputformat.input.dir.recursive=true kicks in as long
as your cluster runs YARN (which it does, because otherwise Tez wouldn't
work).

Cheers,
Gopal



Mime
View raw message