crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Basil (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-337) Make it easier to use multiple input paths
Date Tue, 04 Feb 2014 15:10:09 GMT
Matthew Basil created CRUNCH-337:
------------------------------------

             Summary: Make it easier to use multiple input paths
                 Key: CRUNCH-337
                 URL: https://issues.apache.org/jira/browse/CRUNCH-337
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.9.0
            Reporter: Matthew Basil
            Assignee: Josh Wills
            Priority: Minor


It would be more user-friendly, especially for newbies, to provides methods on {{From}} for
creating sources from multiple {{Path}}s.  I'm currently attempting to write my first Crunch
Pipeline, which needs to read from multiple paths using a custom input format, and I needed
to dig into the source for {{From.formattedFile}} to see I need to do something like this

{code}
PTableType<K, V> tableType = keyType.getFamily().tableOf(keyType, valueType);
return new FileTableSourceImpl<K, V>(paths, tableType, formatClass);
{code}

I don't particularly mind, but other potential new users might be a bit put off by having
to look at the source on the first line of their first pipeline. If it's undesirable to double
the number of methods in {{From}} by doing this (which is understandable), it might be nice
to add some note on multiple input paths to the section of the users guide on {{Source}}s.

Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message