crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Writing MapFile through Crunch, issue reading through Hadoop
Date Mon, 09 Sep 2013 17:44:08 GMT
Tough to assign blame here-- writing a _SUCCESS bit is usually a good
thing, and most Hadoop file formats are smart about filtering out files
that start with "_" or ".", or allowing you to specify an instance of
PathFilter that can be used to ignore hidden files.

One way around this would be to add an option to Targets that would disable
writing the _SUCCESS flag, which would be part of a more general change to
allow per-Source and per-Target configuration options. For example, you
could specify that some outputs of an MR job were compressed using gzip,
and others were compressed using Snappy, instead of having a single
compression strategy for everything.



On Mon, Sep 9, 2013 at 10:28 AM, Hansen,Chuck <Chuck.Hansen@cerner.com>wrote:

>   With Crunch versions prior to 0.7.x, there does not appear to be an
> _SUCCESS file written upon completion, starting with 0.7.x there is.  This
> file (and any others not intended to be read through [1]) appears to cause
> issue with [1].  This means writing a MapFile with crunch and reading back
> with [1] works prior to 0.7.x, but starting with 0.7.x, [1] will throw an
> exception.
>
>  Is this a bug with Crunch and/or Hadoop?
>
>  [1] org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat.*
> getReaders*
> *
> *
> Hadoop CDH versions used:
>
>     <hadoopCoreVersion>2.0.0-mr1-cdh4.2.1</hadoopCoreVersion>
>
>     <hadoop_commonAndHDFSVersion>2.0.0-cdh4.2.1</
> hadoop_commonAndHDFSVersion>
>
>  --
>  *Chuck Hansen*
> Software Engineer, Record Dev
> chuck.hansen@cerner.com | 816-201-9629
> Cerner Corporation | www.cerner.com
>    CONFIDENTIALITY NOTICE This message and any included attachments are
> from Cerner Corporation and are intended only for the addressee. The
> information contained in this message is confidential and may constitute
> inside or non-public information under international, federal, or state
> securities laws. Unauthorized forwarding, printing, copying, distribution,
> or use of such information is strictly prohibited and may be unlawful. If
> you are not the addressee, please promptly delete this message and notify
> the sender of the delivery error by e-mail or you may call Cerner's
> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message