crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-543) AvroPathPerKeyTarget copy nested subdirectories
Date Fri, 16 Oct 2015 16:32:05 GMT


Micah Whitacre commented on CRUNCH-543:

Thanks for the patches [~aeckstein].  

We'll probably need to work out if this is going to conflict a bit with @tomwhite patch for 

Also I'm concerned about potentially keeping that many writers open per map task.  The best
practice for this target is to make sure all the instances of a single key are in the same
partition[1] which helps to avoid having all of those instances open.

[1] -

> AvroPathPerKeyTarget copy nested subdirectories
> -----------------------------------------------
>                 Key: CRUNCH-543
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Adric Eckstein
>            Assignee: Josh Wills
>             Fix For: 0.13.0
>         Attachments: CRUNCH-543.patch, CRUNCH-543b.patch, CRUNCH-543c.patch
> When using AvroPathPerKeyTarget to write out a subpath in the output directory using
a String key, the key might indicate multiple subfolders:
> Pair<String, String> kv = new Pair<String, String>("foo/bar", "value");
> PTable<String, String> kvs = pipeline.create(Arrays.asList(kv),Avros.tableOf(Avros.strings(),
> PTables.asPTable(kvs).write(new AvroPathPerKeyTarget("output"));
> This throws the error:
> java.lang.IllegalArgumentException: Reducer output name 'bar' cannot
be parsed
> 	at$CompletionHook.handleMultiPaths(
> ...
> In AvroPathPerKeyTarget the handleOutputs method would need to recursively copy subfolders
(currently only checks first level in output directory) to enable keys that define multiple
sub folders.

This message was sent by Atlassian JIRA

View raw message