crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-219) Support multiple paths in Avro source
Date Tue, 18 Jun 2013 10:31:20 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tom White updated CRUNCH-219:
-----------------------------

    Attachment: CRUNCH-219.patch

Here's a new patch that adds multi-path support to all file-based inputs.

I haven't changed MaterializableIterable, but then I'm not sure it's needed, since only Sources
can have multiple paths. Targets and SourceTargets are still single paths, and for each of
MapsideJoinStrategy, BloomFilterJoinStrategy, and Sort the PCollection being materialized
is not an input collection, so it's a SourceTarget (I think), and hence a single path. (I'm
not sure it's even possible to change MaterializableIterable to have a getPaths() method since
FilterKeysWithBloomFilterFn calls PType.getPath() with a single path to get a SourceTarget.)
Does this sound right to you Josh, or am I missing something?
                
> Support multiple paths in Avro source
> -------------------------------------
>
>                 Key: CRUNCH-219
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-219
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Tom White
>            Assignee: Josh Wills
>         Attachments: CRUNCH-219.patch, CRUNCH-219.patch
>
>
> It would be useful to be able to specify multiple paths (which may be files, or directories,
or a combination of both) to read from in a source.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message