Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7CA1010EC1 for ; Tue, 30 Dec 2014 22:14:13 +0000 (UTC) Received: (qmail 51704 invoked by uid 500); 30 Dec 2014 22:14:13 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 51666 invoked by uid 500); 30 Dec 2014 22:14:13 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 51652 invoked by uid 500); 30 Dec 2014 22:14:13 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 51649 invoked by uid 99); 30 Dec 2014 22:14:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Dec 2014 22:14:13 +0000 Date: Tue, 30 Dec 2014 22:14:13 +0000 (UTC) From: "Micah Whitacre (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-429) The CSVFileSource does not always function properly MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261560#comment-14261560 ] Micah Whitacre commented on CRUNCH-429: --------------------------------------- [~unluckyboy], interesting I don't typically use s3. My suggestion was to cut down on retrieving the FileSystem object because typically for a Source it would not change. In your s3 use case do you typically interact with multiple instances that you would need to vary config with each path? Or do you mix reading CSV files from HDFS and s3 inside a single Source? The reason I ask is that you should still be able to use the current CSVFileSource by configuring the connection information for s3 using the Source's inputConf(...) methods[1]. If that is prohibitive feel free to open up another issue and we can enhance the Source code. [1] - http://crunch.apache.org/apidocs/0.8.4/org/apache/crunch/Source.html#inputConf(java.lang.String, java.lang.String) > The CSVFileSource does not always function properly > --------------------------------------------------- > > Key: CRUNCH-429 > URL: https://issues.apache.org/jira/browse/CRUNCH-429 > Project: Crunch > Issue Type: Bug > Components: Core > Affects Versions: 0.8.3 > Reporter: mac champion > Assignee: mac champion > Priority: Minor > Labels: csv, csvparser > Fix For: 0.8.4, 0.11.0 > > Attachments: 0001-CRUNCH-429-Fix-CSVInputFormat.patch, CRUNCH-429_a.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > The "configure" method of CSVInputFormat does not have any effect on its configuration and is never called. Instead, the class needs to implement Configurable and set its configuration options in an overriden setConf method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)