crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adric Eckstein (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-543) AvroPathPerKeyTarget copy nested subdirectories
Date Thu, 15 Oct 2015 19:24:05 GMT


Adric Eckstein updated CRUNCH-543:
    Attachment: CRUNCH-543c.patch

Forgot to add second patch to the output format.  This helps for map-only jobs where the keys
are not necessarily sorted by keeping a map of the open writers and then closing all files
after the task completes.  These two together seem to prevent the file collisions in the avro

> AvroPathPerKeyTarget copy nested subdirectories
> -----------------------------------------------
>                 Key: CRUNCH-543
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Adric Eckstein
>            Assignee: Josh Wills
>             Fix For: 0.13.0
>         Attachments: CRUNCH-543.patch, CRUNCH-543b.patch, CRUNCH-543c.patch
> When using AvroPathPerKeyTarget to write out a subpath in the output directory using
a String key, the key might indicate multiple subfolders:
> Pair<String, String> kv = new Pair<String, String>("foo/bar", "value");
> PTable<String, String> kvs = pipeline.create(Arrays.asList(kv),Avros.tableOf(Avros.strings(),
> PTables.asPTable(kvs).write(new AvroPathPerKeyTarget("output"));
> This throws the error:
> java.lang.IllegalArgumentException: Reducer output name 'bar' cannot
be parsed
> 	at$CompletionHook.handleMultiPaths(
> ...
> In AvroPathPerKeyTarget the handleOutputs method would need to recursively copy subfolders
(currently only checks first level in output directory) to enable keys that define multiple
sub folders.

This message was sent by Atlassian JIRA

View raw message