crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: CrunchJobHooks.handleMultiPaths(..) file pattern expectations
Date Fri, 26 Apr 2013 20:21:37 GMT
>> Can you gist up a patch and/or post it to a JIRA so we can take a look?

I'll work on cleaning up my code a bit and attach it to a JIRA.


On Fri, Apr 26, 2013 at 3:19 PM, Josh Wills <jwills@cloudera.com> wrote:

> Can you gist up a patch and/or post it to a JIRA so we can take a look?
>
>
> On Fri, Apr 26, 2013 at 12:30 PM, Micah Whitacre <mkwhitacre@gmail.com>wrote:
>
>> So as mentioned I'm currently trying out adding Avro Trevni support to
>> Crunch.  I think I've gotten everything working with the exception that my
>> output is not being copied to the correct directory upon completion.
>>
>> I'm extending the FileTargetImpl and have the following in my
>> implementation:
>>
>>     @Override
>>     public void configureForMapReduce(Job job, PType<?> ptype, Path
>> outputPath, String name) {
>>          .....
>>         configureForMapReduce(job, AvroKey.class, NullWritable.class,
>> AvroTrevniKeyOutputFormat.class,
>>                 outputPath, name);
>>
>>         //AvroTrevniKeyOutputFormat uses this set value to write content
>> directly to this path.  Therefore
>>         // resetting the value with the named value.
>>         if(name != null){
>>             FileOutputFormat.setOutputPath(job, new Path(outputPath,
>> name));
>>         }
>>
>> This produces the following in the crunch tmp directory:
>>
>> $ pwd
>>
>> /var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6467712912178902519/tmp-crunch.tmp.dir/crunch-1902403831/p1/output/out0
>> $ ls
>> _SUCCESS part-m-00000
>> $ cd part-m-00000/
>> $ ls -l
>> total 8
>> -rwxrwxrwx  1 mw010351  staff  493 Apr 26 13:52 part-0.trv
>> -rw-r--r--  1 mw010351  staff    0 Apr 26 13:52 part-m-00000
>>
>> the part-0.trv is the file of the most interest and ideally I'd be able
>> to avoid the extra part-m-00000 directory (but I can work on that
>> configuration because it is inside of Trevni I think).
>>
>> Unfortunately the directories from the crunch tmdir isn't getting copied
>> to the expected output directory because the CrunchJobHooks for completion
>> expects folders to be of the form "out#-*" and  the directory that is
>> getting created does not have the "-" or take the form like others
>> ("out0-m-00000").  Am I missing some configuration in my target that would
>> cause the directory to be created like that?  Or should the pattern for
>> finding directories to copy be lessened to not have the final "-"?
>>
>> Thoughts?
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
View raw message