hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peeyush Bishnoi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3306) streaming should default to KeyValueTextInputFormat with IdentityMapper
Date Wed, 28 May 2008 14:17:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600481#action_12600481
] 

Peeyush Bishnoi commented on HADOOP-3306:
-----------------------------------------

This issue is somewhat similar to the issue HADOOP-2876

> streaming should default to KeyValueTextInputFormat with IdentityMapper
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-3306
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3306
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>    Affects Versions: 0.15.3
>            Reporter: Joydeep Sen Sarma
>            Priority: Minor
>
> in 15.3 - streaming defaults to TextInputFormat (without -inputformat option).
> this is great in case the PipeMapper is used. but in many cases people want to do an
IdentityMapper - and it fails with the IdentityMapper:
> a) the map output key type becomes LongWritable (but hadoop has already defaulted to
expect Text)
> b) the map output key is the Line number - and intuitively - this is not what the user
expects (almost no one wants to use the line number as the map key).
> if we could simply default to KeyValueTextInputFormat with IdentityMapper - that would
resolve both of these problems. This would change default behavior though - so a little leery
..
> using '-mapper cat' is the common workaround - but it just seems like a needless waste
of resources ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message