hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Cloudera 18.3 splits bz2 inputs
Date Mon, 16 Nov 2009 21:55:28 GMT
Hi Usman/Mike,

This feature is slated for 0.21 (not 0.20.1)

We have not backported it into Cloudera's release of 0.20.1, though we'll
certainly consider doing so if there appears to be demand for it in the
community. Anecdotally we've seen that not too many people are using bzip2
since the CPU overhead is high enough that it's not worth the space savings.

-Todd

On Sat, Nov 14, 2009 at 10:30 AM, Mike Kendall <mkendall@justin.tv> wrote:

> it's gonna be in 20.1...  :(
>
> On Sat, Nov 14, 2009 at 12:34 AM, Usman Waheed <usmanw@opera.com> wrote:
>
> > Hi,
> >
> > I was under the impression that Cloudera's 18.3 can split bz2 input logs
> > during the map phase, is that not so?
> > As of now i see each bz2 file being processed in one entire map task in
> my
> > running jobs.
> > Maybe i am missing something here.
> >
> > Thanks,
> > Usman
> >
> > --
> > Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message