hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kuhn <martin.k...@affinitas.de>
Subject Re: What is best way to load data into hive tables/hadoop file system
Date Wed, 02 Nov 2011 10:00:41 GMT
You could try to use splittable LZO compression instead: https://github.com/kevinweil/hadoop-lzo
(a gz file can't be split)


> We have multiple terabytes of data (currently in gz format approx size 2GB per file).
What is best way to load that data into Hadoop?

> We have seen that (especially when loaded using hive's load data local inpath ....) to
load a gz file it takes around 12 seconds and when we decompress it (around 4~5GB) it takes
8 minutes to load the file.

> We want these files to be processed using multiple mappers on the Hadoop and not with
singles.

> What would be best way to load these files in Hive/hdfs so that it takes less time to
load as well as use multiple mappers to process the files.

Mime
View raw message