hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Grover <grover.markgro...@gmail.com>
Subject Re: Hive Loading Zip CSV Files
Date Tue, 13 Nov 2012 18:54:19 GMT
bcc: cdh-user

This question might be more appropriate for the Apache Hive user list, so
redirecting it there.

However to answer your question:
>From the little I've read about PKZip, they follow the standard zip format.
So the question you are really asking is if Hive supports reading from zip
files. As far as I know, the answer is no. This is because Hadoop doesn't
have an InputFormat for reading zip files:
https://issues.apache.org/jira/browse/MAPREDUCE-210
There is also a Hive user email thread that tackles the same question:
http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E

Having said that, a possible workaround would be to unzip the zip files and
use a different compression codec (e.g. Snappy) on SequenceFile's for
storing your files on HDFS.

Good luck!
Mark



On Tue, Nov 13, 2012 at 9:17 AM, ben <bbuild11@gmail.com> wrote:

> Anybody ever try to load CSV files compressed using PKZip into a Hive
> table stored as Sequence Files? Is there a SerDe out there for this?
>
> Thanks,
> Ben
>
> --
>
>
>
>

Mime
View raw message