hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2921) align map splits on sorted files with key boundaries
Date Sun, 02 Mar 2008 15:52:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574225#action_12574225

Runping Qi commented on HADOOP-2921:

I see.

+1  to make the current SequenceFileRecordReader to do the key boundry check,
or implement a new record reader just for that.

> align map splits on sorted files with key boundaries
> ----------------------------------------------------
>                 Key: HADOOP-2921
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2921
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.16.0
>            Reporter: Joydeep Sen Sarma
> (this is something that we have implemented in the application layer - may be useful
to have in hadoop itself).
> long term log storage systems often keep data sorted (by some sort-key). future computations
on such files can often benefit from this sort order. if the job requires grouping by the
sort-key - then it should be possible to do reduction in the map stage itself.
> this is not natively supported by hadoop (except in the degenerate case of 1 map file
per task) since splits can span the sort-key. however aligning the data read by the map task
 to sort key boundaries is straightforward - and this would be a useful capability to have
in hadoop.
> the definition of the sort key should be left up to the application (it's not necessarily
the key field in a Sequencefile) through a generic interface - but otherwise - the sequencefile
and text file readers can use the extracted sort key to align map task data with key boundaries.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message