crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-632) Add compression support for CSVFileSource
Date Thu, 12 Jan 2017 02:07:16 GMT


Micah Whitacre commented on CRUNCH-632:

you're right that they aren't typically splittable (assuming gzip compression is used), but
for example Snappy compression does support input splits.

I thought the issue is that Snappy is only splittable on block based file formats (e.g. Avro
and sequence) but not on whole file compression.

> Add compression support for CSVFileSource
> -----------------------------------------
>                 Key: CRUNCH-632
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Jim McStanton
>            Priority: Minor
> Currently CSVFileSource does not support decompressing files before reading them, and
simply opens the file and starts reading the contents:

> This source would more closely match TextFileSource if this support was added. The {{LineRecordReader}}
supports this behavior [here|].

This message was sent by Atlassian JIRA

View raw message