hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
Date Wed, 04 Jun 2014 21:07:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018184#comment-14018184
] 

zhihai xu commented on MAPREDUCE-5777:
--------------------------------------

Hi Karthik,
Thanks for the comment. It look like all other methods in LineRecordReader are also duplicated
between MapRed(old API) and MapReduce(new API), Can we create another JIRA to handle the duplication
between MapRed(old API) and MapReduce(new API)?
thanks
zhihai

> Support utf-8 text with BOM (byte order marker)
> -----------------------------------------------
>
>                 Key: MAPREDUCE-5777
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0, 2.2.0
>            Reporter: bc Wong
>            Assignee: zhihai xu
>         Attachments: MAPREDUCE-5777.000.patch, MAPREDUCE-5777.001.patch, MAPREDUCE-5777.002.patch,
MAPREDUCE-5777.003.patch, MAPREDUCE-5777.004.patch
>
>
> UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should
recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message