Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA25110758 for ; Mon, 9 Sep 2013 17:45:02 +0000 (UTC) Received: (qmail 95336 invoked by uid 500); 9 Sep 2013 17:44:57 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 95290 invoked by uid 500); 9 Sep 2013 17:44:54 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 95281 invoked by uid 99); 9 Sep 2013 17:44:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Sep 2013 17:44:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jwills@cloudera.com designates 209.85.217.178 as permitted sender) Received: from [209.85.217.178] (HELO mail-lb0-f178.google.com) (209.85.217.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Sep 2013 17:44:50 +0000 Received: by mail-lb0-f178.google.com with SMTP id z5so5303463lbh.37 for ; Mon, 09 Sep 2013 10:44:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=WSDY87a1tfYbFAgshDfvyOoEtAqf2Lki9mWHMTg+nXE=; b=RmYfH08gcsAYXYvQsCwBKN8Xqky5pYv5JGb8nqEOKBCFx8WC6gW4hRn5ZvrOk6oNsJ t7k88HdjB1cbezryldfvY2MdWq92MoZnEUy0vD+k4SJx7FFp5IKfOgpqCaPPY4M77rvj /4hfAXpM+yAqaUL6ftQ6rAooVcQKRys/4wu/IfWdkpwF/BHPro/hJc7T7sXukG98kaiI 6Gbi0tbmnn9mWQusTVhQnm7U1i5jT7/dqpkS4smd6W7OvdzODKbm4e/OtrL81ibK4QqD DuJTV7rZs7v/kkvek7FHGtrKlw26Yy9ZaRYVPKH9Fz67gCVgP2RC/NHY9LpCNIGRy9aO EYmQ== X-Gm-Message-State: ALoCoQnq6xEympCdx0NvOKzcgKh98pZDDjlsWVUafHiWJCB7uMmga06TQJqg1G7So6p9K3eKb8K9 X-Received: by 10.152.27.202 with SMTP id v10mr1811018lag.43.1378748668690; Mon, 09 Sep 2013 10:44:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.83.161 with HTTP; Mon, 9 Sep 2013 10:44:08 -0700 (PDT) In-Reply-To: <2493CAF423E7104B9F5C6D863AEAD3C00DB3915C@CERNMSGLS5MB3A.cerner.net> References: <2493CAF423E7104B9F5C6D863AEAD3C00DB390FD@CERNMSGLS5MB3A.cerner.net> <2493CAF423E7104B9F5C6D863AEAD3C00DB3915C@CERNMSGLS5MB3A.cerner.net> From: Josh Wills Date: Mon, 9 Sep 2013 10:44:08 -0700 Message-ID: Subject: Re: Writing MapFile through Crunch, issue reading through Hadoop To: user@crunch.apache.org Content-Type: multipart/alternative; boundary=089e0160a432a1d7a004e5f6f0d5 X-Virus-Checked: Checked by ClamAV on apache.org --089e0160a432a1d7a004e5f6f0d5 Content-Type: text/plain; charset=ISO-8859-1 Tough to assign blame here-- writing a _SUCCESS bit is usually a good thing, and most Hadoop file formats are smart about filtering out files that start with "_" or ".", or allowing you to specify an instance of PathFilter that can be used to ignore hidden files. One way around this would be to add an option to Targets that would disable writing the _SUCCESS flag, which would be part of a more general change to allow per-Source and per-Target configuration options. For example, you could specify that some outputs of an MR job were compressed using gzip, and others were compressed using Snappy, instead of having a single compression strategy for everything. On Mon, Sep 9, 2013 at 10:28 AM, Hansen,Chuck wrote: > With Crunch versions prior to 0.7.x, there does not appear to be an > _SUCCESS file written upon completion, starting with 0.7.x there is. This > file (and any others not intended to be read through [1]) appears to cause > issue with [1]. This means writing a MapFile with crunch and reading back > with [1] works prior to 0.7.x, but starting with 0.7.x, [1] will throw an > exception. > > Is this a bug with Crunch and/or Hadoop? > > [1] org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat.* > getReaders* > * > * > Hadoop CDH versions used: > > 2.0.0-mr1-cdh4.2.1 > > 2.0.0-cdh4.2.1 hadoop_commonAndHDFSVersion> > > -- > *Chuck Hansen* > Software Engineer, Record Dev > chuck.hansen@cerner.com | 816-201-9629 > Cerner Corporation | www.cerner.com > CONFIDENTIALITY NOTICE This message and any included attachments are > from Cerner Corporation and are intended only for the addressee. The > information contained in this message is confidential and may constitute > inside or non-public information under international, federal, or state > securities laws. Unauthorized forwarding, printing, copying, distribution, > or use of such information is strictly prohibited and may be unlawful. If > you are not the addressee, please promptly delete this message and notify > the sender of the delivery error by e-mail or you may call Cerner's > corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024. > -- Director of Data Science Cloudera Twitter: @josh_wills --089e0160a432a1d7a004e5f6f0d5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Tough to assign blame here-- writing a _SUCCESS bit is usu= ally a good thing, and most Hadoop file formats are smart about filtering o= ut files that start with "_" or ".", or allowing you to= specify an instance of PathFilter that can be used to ignore hidden files.=

One way around this would be to add an option to Targets tha= t would disable writing the _SUCCESS flag, which would be part of a more ge= neral change to allow per-Source and per-Target configuration options. For = example, you could specify that some outputs of an MR job were compressed u= sing gzip, and others were compressed using Snappy, instead of having a sin= gle compression strategy for everything.



On Mon, Sep 9, 2013 at 10:28 AM, Hansen,Chuck <= ;Chuck.Hansen@= cerner.com> wrote:
With Crunch versions prior to 0.7.x, there does not appear to be an _S= UCCESS file written upon completion, starting with 0.7.x there is. =A0This = file (and any others not intended to be read through [1]) appears to cause = issue with [1]. =A0This means writing a MapFile with crunch and reading back with [1] works prior to 0.7.x, but = starting with 0.7.x, [1] will throw an exception.=A0

Is this a bug with Crunch and/or Hadoop?

[1]=A0org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat.ge= tReaders

Hadoop CDH versions used:=A0

=A0 =A0=A0<hadoopCoreVersion>2.0.0-mr1-cdh4.2.1</hadoopCoreVersion>=A0

=A0 =A0 = <hadoop_commonAndHDFSVersion>2.0.0-cdh4.2.1</hadoop_commonAndHDFSVersion>=A0


--=A0
Chuck Hansen
Software Engineer, Record Dev<= /span>
Cerner Corporation |=A0www.cerner.com
CONFIDENTIALITY NOTICE This message and any included attachments are from C= erner Corporation and are intended only for the addressee. The information = contained in this message is confidential and may constitute inside or non-= public information under international, federal, or state securities laws. = Unauthorized forwarding, printing, copying, distribution, or use of such in= formation is strictly prohibited and may be unlawful. If you are not the ad= dressee, please promptly delete this message and notify the sender of the d= elivery error by e-mail or you may call Cerner's corporate offices in K= ansas City, Missouri, U.S.A at (+1) (816)221-1024.



--
Directo= r of Data Science
Twitter: @josh_wills
--089e0160a432a1d7a004e5f6f0d5--