hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6208) There should be an input format for MapFiles which can be configured so that only a fraction of the input data is used for the MR process
Date Wed, 06 May 2015 03:27:47 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer updated MAPREDUCE-6208:
----------------------------------------
    Labels: BB2015-05-TBR inputformat mapfile  (was: inputformat mapfile)

> There should be an input format for MapFiles which can be configured so that only a fraction
of the input data is used for the MR process
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6208
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Jens Rabe
>            Assignee: Jens Rabe
>              Labels: BB2015-05-TBR, inputformat, mapfile
>         Attachments: MAPREDUCE-6208.001.patch, MAPREDUCE-6208.002.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In some cases there are large amounts of data organized in MapFiles, e.g., from previous
MapReduce tasks, and only a fraction of the data is to be processed in a MR task. The current
approach, as I understand, is to re-organize the data in a suitable partition using folders
on HDFS, and only use relevant folders as input paths, and maybe doing some additional filtering
in the Map task. However, sometimes the input data cannot be easily partitioned that way.
For example, when processing large amounts of measured data where additional data on a time
period already in HDFS arrives later.
> There should be an input format that accepts folders with MapFiles, and there should
be an option to specify the input key range so that only fitting InputSplits are generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message