hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-263) TCTLSeparatedProtocol should use UTF-8 to decode the data
Date Tue, 03 Feb 2009 01:03:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669825#action_12669825
] 

Joydeep Sen Sarma commented on HIVE-263:
----------------------------------------

+1

the only thing that concerns me is that if any row does not conform to utf-8 - then the whole
job fails. earlier we had tried to have a setup that the serde throws a serdeexception and
we deal with it in query layer (we can ignore some fixed % of bad rows for example).

but looking at the code - this might be hard to do right now - so happy to wait for a rewrite
for this :-)

> TCTLSeparatedProtocol should use UTF-8 to decode the data
> ---------------------------------------------------------
>
>                 Key: HIVE-263
>                 URL: https://issues.apache.org/jira/browse/HIVE-263
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-263.1.patch, HIVE-263.2.patch
>
>
> TCTLSeparatedProtocol now uses the default character encoding. We should use UTF8 from
hadoop Text class:
> Now:
> {code}
>           String row = new String(buf, 0, length);
> {code}
> We want:
> {code}
>           String row;
>           try {
>             row = Text.decode(buf, 0, length);
>           } catch (CharacterCodingException e) {
>             throw new RuntimeException(e);
>           }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message