crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-429) The CSVFileSource does not always function properly
Date Tue, 30 Dec 2014 22:14:13 GMT


Micah Whitacre commented on CRUNCH-429:

[~unluckyboy], interesting I don't typically use s3.  My suggestion was to cut down on retrieving
the FileSystem object because typically for a Source it would not change.  In your s3 use
case do you typically interact with multiple instances that you would need to vary config
with each path?  Or do you mix reading CSV files from HDFS and s3 inside a single Source?
 The reason I ask is that you should still be able to use the current CSVFileSource by configuring
the connection information for s3 using the Source's inputConf(...) methods[1].

If that is prohibitive feel free to open up another issue and we can enhance the Source code.

[1] -,

> The CSVFileSource does not always function properly
> ---------------------------------------------------
>                 Key: CRUNCH-429
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.3
>            Reporter: mac champion
>            Assignee: mac champion
>            Priority: Minor
>              Labels: csv, csvparser
>             Fix For: 0.8.4, 0.11.0
>         Attachments: 0001-CRUNCH-429-Fix-CSVInputFormat.patch, CRUNCH-429_a.patch
>   Original Estimate: 336h
>  Remaining Estimate: 336h
> The "configure" method of CSVInputFormat does not have any effect on its configuration
and is never called. Instead, the class needs to implement Configurable and set its configuration
options in an overriden setConf method.  

This message was sent by Atlassian JIRA

View raw message