hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Nanda <saurabhna...@gmail.com>
Subject Re: bz2 Splits.
Date Fri, 24 Jul 2009 15:09:51 GMT
Please excuse my ignorance, but can I import gzip compressed files directly
as Hive tables? I have separate gzip files for each days weblog data. Right
now I am gunzipping them and then importing into a raw table. Can I import
the gzipped files directly into Hive?

Saurabh.

On Wed, Jul 22, 2009 at 1:07 AM, Ashish Thusoo <athusoo@facebook.com> wrote:

> I don't think these are splittable. Compression on sequencefiles is
> splittable across sequencefile blocks.
>
> Ashish
>
> -----Original Message-----
> From: Bill Craig [mailto:bcraig7@gmail.com]
> Sent: Tuesday, July 21, 2009 8:06 AM
> To: hive-user@hadoop.apache.org
> Subject: bz2 Splits.
>
> I loaded 5 files of bzip2 compressed data into a table in Hive. Three are
> small test files containing 10,000 records. Two were large ~8Gb compressed.
> When I run a query against the table I see three tasks that complete almost
> immediately and two tasks that run for a very long time. It appears to me
> that Hive/Hadoop is not splitting the input of the *.bz2. I have seen some
> old mails about this, but could not find any resolution for this problem. I
> compressed the files using the Apache bz2 jar, the file are named *.bz2. I
> am using Hadoop
> 0.19.1 r745977
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Mime
View raw message