hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shantian Purkad <>
Subject What is best way to load data into hive tables/hadoop file system
Date Mon, 31 Oct 2011 23:33:41 GMT

We have multiple terabytes of data (currently in gz format approx size 2GB per file). What
is best way to load that data into Hadoop?

We have seen that (especially when loaded using hive's load data local inpath ....) to load
a gz file it takes around 12 seconds and when we decompress it (around 4~5GB) it takes 8 minutes
to load the file.

We want these files to be processed using multiple mappers on the Hadoop and not with singles.

What would be best way to load these files in Hive/hdfs so that it takes less time to load
as well as use multiple mappers to process the files.

Thanks and Regards,

View raw message