Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EA22B18292 for ; Mon, 22 Jun 2015 19:39:51 +0000 (UTC) Received: (qmail 3064 invoked by uid 500); 22 Jun 2015 19:39:51 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 3027 invoked by uid 500); 22 Jun 2015 19:39:51 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 3017 invoked by uid 99); 22 Jun 2015 19:39:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2015 19:39:51 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dpo5003@gmail.com designates 209.85.213.181 as permitted sender) Received: from [209.85.213.181] (HELO mail-ig0-f181.google.com) (209.85.213.181) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2015 19:37:36 +0000 Received: by igbqq3 with SMTP id qq3so71578266igb.0 for ; Mon, 22 Jun 2015 12:38:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type; bh=zwn8yisC82qPsbET7Ocnf8azwNA+x2LSYXoDWa4XBKA=; b=vcfcdtc5p5e1PBtzuFe5onov8E2DztkP1bZvIPGnr0y0N4ESddDtV8ViO3fMOPmUdQ qX7Xef2oRMlPDmotg0DxyZAbxm2ACN4er78LnEo2VqAAoUUeTwr3KyWEEJZZJ8rORlrL yiXWPoDRVUVf2Vf/95dF/DdyOHTlgmo77dx7E5fIyCWtDcwUXKdnYj4ez1iPoIfDYO50 eoFFFvtT3T1LeSNKQkyc1WbhhAoUqi03fLKEUeAM5D0cmKtT9UktslMGOzUwYluHZ1Fx LzXADmYXhlmgneSWf7xq+0UAXK3cZ8DZWF7rcEiLtjce0dU6TM8i1lABrFnP48Q+LW9y Im3Q== X-Received: by 10.50.8.68 with SMTP id p4mr23684055iga.4.1435001918625; Mon, 22 Jun 2015 12:38:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: David Ortiz Date: Mon, 22 Jun 2015 19:38:29 +0000 Message-ID: Subject: Re: Retrieving Input File Name with MRPipeline To: user@crunch.apache.org Content-Type: multipart/alternative; boundary=089e011848389c9f5c0519206b26 X-Virus-Checked: Checked by ClamAV on apache.org --089e011848389c9f5c0519206b26 Content-Type: text/plain; charset=UTF-8 Gave it a shot in the following MapFn, but it seems to always return null. new MapFn>() { private static final long serialVersionUID = 1L; int min = minColumns; int max = maxColumns; @Override public Pair map(String input) { //int columns = StringUtils.countMatches(input, "\t") + 1; int columns = input.split("\t").length; if (columns >= min && columns <= max) { StringBuilder output = new StringBuilder(input); output.append('\t'); String loc = this.getContext().getConfiguration().get(TaskInputOutputContext.MAP_INPUT_FILE); output.append(loc); return new Pair<>(output.toString(), null); } else { return new Pair<>(null, input); } } } Also tried setting crunch.disable.combine.file to true figuring that combine files might mess with it. No dice. Does anything look suspect in that snippet? Thanks, Dave On Mon, Jun 22, 2015 at 2:41 PM Micah Whitacre wrote: > The DoFn should give you access to the TaskInputOutputContext[1] which > should contain that information. I believe the context then should hold > the file as a config like "MAP_INPUT_FILE". I haven't really tested this > out so definitely verify. > > > [1] - > https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/TaskInputOutputContext.html > > On Mon, Jun 22, 2015 at 1:28 PM, David Ortiz wrote: > >> Hello, >> >> Is there a way in my crunch pipeline that I can retrieve the file >> name of the input file for my MapFn? This function is definitely applied >> as a Mapper, so I think it should be possible, just having some difficulty >> working through the exact method of doing so. >> >> Thanks, >> Dave >> > > --089e011848389c9f5c0519206b26 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Gave it a shot in the following MapFn, but it seems to alw= ays return null.

new MapFn<String, Pair<String, String>>()=
 {

private static = final long serialVersionUID =3D 1L= ;
int min =3D minColumns;
int max = =3D maxColumns;

@Override
<= /span>public Pair<= String, String> map(String input) {
//int columns =3D StringUtils.countMatches(input, &quo= t;\t") + 1;
= int colu= mns =3D input.split("\t").length;
if (columns >=3D min && columns <=3D max) {
StringBuilder output =3D= new StringBuilder(in= put);
output.append('\t'
);
Stri= ng loc =3D this.getCo= ntext().getConfiguration().get(TaskInputOutputContext.MAP_INPUT_FILE);
= output.append(loc);
return new Pair<>(output.toString(), null);
} else {
return new Pair<>(null, input);
}
}
=
}

Also tried setting crunch.disable.combin=
e.file to true figuring that combine files might mess with it.  No dice.  D=
oes anything look suspect in that snippet?

Tha=
nks,
    Dave 

On Mon, Jun 22, 2015 at 2:41 PM Micah Whitacre <mkwhitacre@gmail.com> w= rote:
The DoFn sho= uld give you access to the TaskInputOutputContext[1] which should contain t= hat information.=C2=A0 I believe the context then should hold the file as a= config like "MAP_INPUT_FILE".=C2=A0 I haven't re= ally tested this out so definitely verify.


On Mon, Jun 22, 2015 at 1:28 PM, David Ortiz <dpo5003@gma= il.com> wrote:
Hello,

=C2=A0 =C2=A0 =C2=A0 Is there a way in my cr= unch pipeline that I can retrieve the file name of the input file for my Ma= pFn?=C2=A0 This function is definitely applied as a Mapper, so I think it s= hould be possible, just having some difficulty working through the exact me= thod of doing so.

Thanks,
=C2=A0 =C2=A0 = =C2=A0 Dave

--089e011848389c9f5c0519206b26--