hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "NOMURA Yoshihide (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3481) TextInputFormat should support character encoding settings
Date Wed, 04 Jun 2008 00:56:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

NOMURA Yoshihide updated HADOOP-3481:

    Status: Open  (was: Patch Available)

Thank you for your reply.

> For instance, this is not a good thing:
> + } catch (Exception e) { + // nop + }
> Firstly, one should never catch "Exception". Secondly, if you catch an exception, you
should do something about. I don't think it is acceptable to silently substitute a different
character encoding. Instead, there should be a fatal error.

I agree.
It should be a fatal error.

> For another, it looks like you are decoding the characters explicitly instead of just
setting teh encoding on the input reader. Am I missing something?

You mean, why I don't use InputStreamReader class?
The Reader could decode characters automatically, but it may hide actual byte length of String.
So I think the characters should decode explicitly in this class.

> TextInputFormat should support character encoding settings
> ----------------------------------------------------------
>                 Key: HADOOP-3481
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3481
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: Windows XP SP3
>            Reporter: NOMURA Yoshihide
> I need to read text files in different character encoding from UTF-8,
> but I think TextInputFormat doesn't support such character encoding.
> I suggest the TextInputFormat to support encoding settings like this.
>   conf.set("io.file.defaultEncoding", "MS932");
> I will submit a patch candidate.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message