crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adric Eckstein (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-543) AvroPathPerKeyTarget copy nested subdirectories
Date Thu, 15 Oct 2015 19:01:05 GMT


Adric Eckstein commented on CRUNCH-543:

I think the attached CRUNCH-543b.patch should fix it (the two loops were only working with
an even number of subdirectories).  

Also, I removed the renaming of the files during the move.  I believe this was causing file
collisions when you don't group by the key prior to writing out (shouldn't have to group).
 Sticking with the output part* names, the files can safely be copied without risk of collision.

> AvroPathPerKeyTarget copy nested subdirectories
> -----------------------------------------------
>                 Key: CRUNCH-543
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Adric Eckstein
>            Assignee: Josh Wills
>             Fix For: 0.13.0
>         Attachments: CRUNCH-543.patch, CRUNCH-543b.patch
> When using AvroPathPerKeyTarget to write out a subpath in the output directory using
a String key, the key might indicate multiple subfolders:
> Pair<String, String> kv = new Pair<String, String>("foo/bar", "value");
> PTable<String, String> kvs = pipeline.create(Arrays.asList(kv),Avros.tableOf(Avros.strings(),
> PTables.asPTable(kvs).write(new AvroPathPerKeyTarget("output"));
> This throws the error:
> java.lang.IllegalArgumentException: Reducer output name 'bar' cannot
be parsed
> 	at$CompletionHook.handleMultiPaths(
> ...
> In AvroPathPerKeyTarget the handleOutputs method would need to recursively copy subfolders
(currently only checks first level in output directory) to enable keys that define multiple
sub folders.

This message was sent by Atlassian JIRA

View raw message