crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: CrunchJobHooks.handleMultiPaths(..) file pattern expectations
Date Fri, 26 Apr 2013 20:20:24 GMT
A little more digging into Trevni for the extra directory here's the code
from the writer that pics the directory

   Path outputPath = FileOutputFormat.getOutputPath(context);

   String dir = FileOutputFormat.getUniqueFile(context, "part", "");

In the above code outputPath equals "out0" and it doesn't calculate a
new directory.

I tweaked the code above to be:

        //AvroTrevniKeyOutputFormat uses this set value to write
content directly to this path.  Therefore
        // resetting the value with the named value.
        if(name != null){
            FileOutputFormat.setOutputPath(job, new Path(outputPath,
name+"-tmp"));
        }

and it now copies the content:

/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit3781988509412645745/output/part-m-00000/part-m-00000
$ ls -l
total 8
-rwxrwxrwx  1 mw010351  staff  493 Apr 26 15:16 part-0.trv
-rw-r--r--  1 mw010351  staff    0 Apr 26 15:16 part-m-00000

The target is pointed at
/var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit3781988509412645745/output.
 So getting closer but still seems like the "-" in the path
restriction might need to be removed.

[1] - http://svn.apache.org/repos/asf/avro/tags/release-1.7.4/lang/java/trevni/avro/src/main/java/org/apache/trevni/avro/mapreduce/AvroTrevniRecordWriterBase.java




On Fri, Apr 26, 2013 at 2:30 PM, Micah Whitacre <mkwhitacre@gmail.com>wrote:

> So as mentioned I'm currently trying out adding Avro Trevni support to
> Crunch.  I think I've gotten everything working with the exception that my
> output is not being copied to the correct directory upon completion.
>
> I'm extending the FileTargetImpl and have the following in my
> implementation:
>
>     @Override
>     public void configureForMapReduce(Job job, PType<?> ptype, Path
> outputPath, String name) {
>          .....
>         configureForMapReduce(job, AvroKey.class, NullWritable.class,
> AvroTrevniKeyOutputFormat.class,
>                 outputPath, name);
>
>         //AvroTrevniKeyOutputFormat uses this set value to write content
> directly to this path.  Therefore
>         // resetting the value with the named value.
>         if(name != null){
>             FileOutputFormat.setOutputPath(job, new Path(outputPath,
> name));
>         }
>
> This produces the following in the crunch tmp directory:
>
> $ pwd
>
> /var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6467712912178902519/tmp-crunch.tmp.dir/crunch-1902403831/p1/output/out0
> $ ls
> _SUCCESS part-m-00000
> $ cd part-m-00000/
> $ ls -l
> total 8
> -rwxrwxrwx  1 mw010351  staff  493 Apr 26 13:52 part-0.trv
> -rw-r--r--  1 mw010351  staff    0 Apr 26 13:52 part-m-00000
>
> the part-0.trv is the file of the most interest and ideally I'd be able to
> avoid the extra part-m-00000 directory (but I can work on that
> configuration because it is inside of Trevni I think).
>
> Unfortunately the directories from the crunch tmdir isn't getting copied
> to the expected output directory because the CrunchJobHooks for completion
> expects folders to be of the form "out#-*" and  the directory that is
> getting created does not have the "-" or take the form like others
> ("out0-m-00000").  Am I missing some configuration in my target that would
> cause the directory to be created like that?  Or should the pattern for
> finding directories to copy be lessened to not have the final "-"?
>
> Thoughts?
>

Mime
View raw message