hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: best way to load millions of gzip files in hdfs to one table in hive?
Date Tue, 02 Oct 2012 21:45:18 GMT
You may want to use:

We use this to deal with pathological cases although the best idea is
to avoid big files all together.


On Tue, Oct 2, 2012 at 4:16 PM, Alexander Pivovarov
<> wrote:
> Options
> 1. create table and put files under the table dir
> 2. create external table and point it to files dir
> 3. if files are small then I recomend to create new set of files using
> simple MR program and specifying number of reduce tasks. Goal is to make
> files size > hdfs block size (it safes NN memory and read will be faster)
> On Tue, Oct 2, 2012 at 3:53 PM, zuohua zhang <> wrote:
>> I have millions of gzip files in hdfs (with the same fields), would like
>> to load them into one table in hive with a specified schema.
>> What is the most efficient ways to do that?
>> Given that my data is only in hdfs, and also gzipped, does that mean I
>> could just simply set up the table somehow bypassing some unnecessary
>> overhead of the typical approach?
>> Thanks!

View raw message