hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander C.H. Lorenz" <wget.n...@googlemail.com>
Subject Re: What is best way to load data into hive tables/hadoop file system
Date Wed, 02 Nov 2011 10:06:05 GMT
bzip2 or snappy-codec will be very usefull for that.

- Alex

On Wed, Nov 2, 2011 at 11:00 AM, Martin Kuhn <martin.kuhn@affinitas.de>wrote:

> You could try to use splittable LZO compression instead:
> https://github.com/kevinweil/hadoop-lzo (a gz file can't be split)
>
>
> > We have multiple terabytes of data (currently in gz format approx size
> 2GB per file). What is best way to load that data into Hadoop?
>
> > We have seen that (especially when loaded using hive's load data local
> inpath ....) to load a gz file it takes around 12 seconds and when we
> decompress it (around 4~5GB) it takes 8 minutes to load the file.
>
> > We want these files to be processed using multiple mappers on the Hadoop
> and not with singles.
>
> > What would be best way to load these files in Hive/hdfs so that it takes
> less time to load as well as use multiple mappers to process the files.
>



-- 
Alexander Lorenz
http://mapredit.blogspot.com

*P **Think of the environment: please don't print this email unless you
really need to.*

Mime
View raw message