hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Chansler (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2954) In streaming, map-output cannot have empty keys
Date Tue, 25 Mar 2008 03:03:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Robert Chansler updated HADOOP-2954:

    Fix Version/s:     (was: 0.17.0)

> In streaming, map-output cannot have empty keys
> -----------------------------------------------
>                 Key: HADOOP-2954
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2954
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Sameer Paranjpye
> Here is the analysis, when the mapper and reducer both are /bin/cat,
> default key field separator: '\t' (or tab)
> for ex, if the input line is:
> the input for the mapper ('cat' in this case) is:
> -
> the output of the mapper is split into a key, value pair as below:
> (key, value) -> (\tSDSDFIKSDFSDFJS, "")
> (i.e. the value is empty)
> the function which splits the output into key,value pair for
> streaming jobs, ignores the first character of the line
> -
> from the above (key, value) pair, the input for the reducer is:
> (key followed by separator followed by value)
> if the reducer is set to NONE, the above line is the output of
> the map task
> -
> the output of the reducer ('cat' in this case) is:
> -
> if the line starts with the field separator, it is possible that
> the output of the mapper can be assigned to different reducers because
> it is possible that the line contains more than once instance of the
> field separator - for ex:
> input-line=\tABCDEFGH
> key=\tABCDEFGH
> value=
> (value is empty)
> output-line=\tABCDEFGH\t
> value=JHUHJH
> output-line=\tABCDEFGHYH\tJHUHJH
> assuming defaults (HashPartitioner), they are likely to be assigned to
> different reducers because the keys are different.
> The streaming contract  says that from beginning of the line upto the first tab is the
key, so key should be empty string. But it is not.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message