hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Weiner <...@gumgum.com>
Subject Re: Support for zipped input files
Date Tue, 10 Mar 2009 16:42:51 GMT
Thanks very much, Tom.  You saved me a lot of time by confirming that it
isn't available yet.  I'll go vote for HADOOP-1824.

On Tue, Mar 10, 2009 at 3:23 AM, Tom White <tom@cloudera.com> wrote:

> Hi Ken,
>
> Unfortunately, Hadoop doesn't yet support MapReduce on zipped files
> (see https://issues.apache.org/jira/browse/HADOOP-1824), so you'll
> need to write a program to unzip them and write them into HDFS first.
>
> Cheers,
> Tom
>
> On Tue, Mar 10, 2009 at 4:11 AM, jason hadoop <jason.hadoop@gmail.com>
> wrote:
> > Hadoop has support for S3, the compression support is handled at another
> > level and should also work.
> >
> >
> > On Mon, Mar 9, 2009 at 9:05 PM, Ken Weiner <ken@gumgum.com> wrote:
> >
> >> I have a lot of large zipped (not gzipped) files sitting in an Amazon S3
> >> bucket that I want to process.  What is the easiest way to process them
> >> with
> >> a Hadoop map-reduce job?  Do I need to write code to transfer them out
> of
> >> S3, unzip them, and then move them to HDFS before running my job, or
> does
> >> Hadoop have support for processing zipped input files directly from S3?
> >>
> >
> >
> >
> > --
> > Alpha Chapters of my book on Hadoop are available
> > http://www.apress.com/book/view/9781430219422
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message