crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-331) Change default settings for CombineFileInputFormat
Date Mon, 03 Feb 2014 22:33:06 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890019#comment-13890019
] 

Micah Whitacre commented on CRUNCH-331:
---------------------------------------

So that sounds like you are leaning towards option #1?

>From the consumption patterns I have that is probably ok (with the exception of adding
on hfiles to the default enabled list) but communicating the change would be very important
as you brought up.

> Change default settings for CombineFileInputFormat
> --------------------------------------------------
>
>                 Key: CRUNCH-331
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-331
>             Project: Crunch
>          Issue Type: Bug
>          Components: IO
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Josh Wills
>
> Currently, we default to enabling the CombineFileInputFormat settings for any extensions
of FileSourceImpl b/c it tends to improve performance for common file formats like text, sequence
files, and Avro files. However, this default has caused problems for formats like Parquet
and for custom file formats that have complex split logic.
> This JIRA is to track modifying the default combine file settings in at least some contexts,
such as with From.formattedFile for custom input formats.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message