crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-632) Add compression support for CSVFileSource
Date Fri, 13 Jan 2017 23:50:26 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822533#comment-15822533
] 

Josh Wills commented on CRUNCH-632:
-----------------------------------

I'm with Gabriel, I don't think supporting BZip2Codec w/special handling makes a ton of sense
unless someone really wants it. +1 for the patch as-is.

> Add compression support for CSVFileSource
> -----------------------------------------
>
>                 Key: CRUNCH-632
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-632
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Jim McStanton
>            Assignee: Micah Whitacre
>            Priority: Minor
>         Attachments: CRUNCH-632b.patch, CRUNCH-632.patch
>
>
> Currently CSVFileSource does not support decompressing files before reading them, and
simply opens the file and starts reading the contents: https://github.com/apache/crunch/blob/6280983179e9c690af69c2bf0e296b054122d724/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVRecordReader.java#L127.

> This source would more closely match TextFileSource if this support was added. The {{LineRecordReader}}
supports this behavior [here|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.7.1/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java?av=f#87].




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message