hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1441) Splittability of input should be controllable by application
Date Wed, 20 Jun 2007 20:02:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Doug Cutting updated HADOOP-1441:

    Status: Open  (was: Patch Available)

One can already implement this with either:
class MyInputFormat extends FileInputFormat {
  protected boolean isSplitable(FileSystem fs, Path path) { return false; }
or even more simply with
job.setLong("mapred.min.split.size", Long.MAX_VALUE);

So I'm not convinced we need this.

Also, if we were to a new FileInputFormat configuration parameter for this, then it should
have a name that indicates it's specific to FileInputFormat, like "mapred.fileinputformat.splitable",
and we should add static methods in FileInputFormat to get and set it.  (We have not been
good about this in the past, but, for new code, that's the preferred style.)  And then we
probably don't need to document it in hadoop-default.xml, since it's not something folks would
need to specify in a config file.

> Splittability of input should be controllable by application
> ------------------------------------------------------------
>                 Key: HADOOP-1441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1441
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>         Environment: ALL
>            Reporter: Milind Bhandarkar
>            Assignee: Milind Bhandarkar
>             Fix For: 0.14.0
>         Attachments: HADOOP-1441_1.patch
> Currently, isSplittable method of FileInputFormat always returns true. For some applications,
it becomes necessary that the map task process entire file, rather than a block. Therefore,
splittability of input (i.e. block-level split vs file-level-split) should be controllable
by user via a configuration variable. The default could be block-level split, as is.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message