crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ortiz <dpo5...@gmail.com>
Subject Re: Retrieving Input File Name with MRPipeline
Date Mon, 22 Jun 2015 19:38:29 GMT
Gave it a shot in the following MapFn, but it seems to always return null.

new MapFn<String, Pair<String, String>>() {

   private static final long serialVersionUID = 1L;
   int min = minColumns;
   int max = maxColumns;

   @Override
   public Pair<String, String> map(String input) {
      //int columns = StringUtils.countMatches(input, "\t") + 1;
      int columns = input.split("\t").length;
      if (columns >= min && columns <= max) {
         StringBuilder output = new StringBuilder(input);
         output.append('\t');
         String loc =
this.getContext().getConfiguration().get(TaskInputOutputContext.MAP_INPUT_FILE);
         output.append(loc);
         return new Pair<>(output.toString(), null);
      } else {
         return new Pair<>(null, input);
      }
   }

}


Also tried setting crunch.disable.combine.file to true figuring that
combine files might mess with it.  No dice.  Does anything look
suspect in that snippet?


Thanks,

    Dave


On Mon, Jun 22, 2015 at 2:41 PM Micah Whitacre <mkwhitacre@gmail.com> wrote:

> The DoFn should give you access to the TaskInputOutputContext[1] which
> should contain that information.  I believe the context then should hold
> the file as a config like "MAP_INPUT_FILE".  I haven't really tested this
> out so definitely verify.
>
>
> [1] -
> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/TaskInputOutputContext.html
>
> On Mon, Jun 22, 2015 at 1:28 PM, David Ortiz <dpo5003@gmail.com> wrote:
>
>> Hello,
>>
>>       Is there a way in my crunch pipeline that I can retrieve the file
>> name of the input file for my MapFn?  This function is definitely applied
>> as a Mapper, so I think it should be possible, just having some difficulty
>> working through the exact method of doing so.
>>
>> Thanks,
>>       Dave
>>
>
>

Mime
View raw message