hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ciemiewicz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-771) PigDump does not properly output Chinese UTF8 characters - they are displayed as question marks ??
Date Mon, 27 Apr 2009 21:18:30 GMT

    [ https://issues.apache.org/jira/browse/PIG-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703383#action_12703383
] 

David Ciemiewicz commented on PIG-771:
--------------------------------------

Daniel,

Thanks.  locale reported LANG=POSIX

I used locale -a to list the locales and then did:

export LANG=en_US.utf8

Then I got the correct PigDump output.

I found that also setting LESSCHARSET to utf-8 was valuable as well.

For bash users:

export LANG=en_US.utf8
export LESSCHARSET=utf-8


It would be useful if dump/PigDump() had a warning which indicated to the user if LANG=POSIX,
then asian language characters may not display properly.  Something like:

if (function_which_returns_local_setting_which_I_dont_know_name_of().equals("POSIX")) {
    System.out.println("WARNING: dump will not properly display multibyte UTF-8 characters
when environment variable LANG=\"POSIX\".  Try setting your environment variable LANG=en_US.utf8.
 See locale -a for other possible values.")
}

> PigDump does not properly output Chinese UTF8 characters - they are displayed as question
marks ??
> --------------------------------------------------------------------------------------------------
>
>                 Key: PIG-771
>                 URL: https://issues.apache.org/jira/browse/PIG-771
>             Project: Pig
>          Issue Type: Bug
>            Reporter: David Ciemiewicz
>
> PigDump does not properly output Chinese UTF8 characters.
> The reason for this is that the function Tuple.toString() is called.
> DefaultTuple implements Tuple.toString() and it calls Object.toString() on the opaque
object d.
> Instead, I think that the code should be changed instead to call the new DataType.toString()
function.
> {code}
>     @Override
>     public String toString() {
>         StringBuilder sb = new StringBuilder();
>         sb.append('(');
>         for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
>             Object d = it.next();
>             if(d != null) {
>                 if(d instanceof Map) {
>                     sb.append(DataType.mapToString((Map<Object, Object>)d));
>                 } else {
>                     sb.append(DataType.toString(d));  // <<< Change this one
line
>                     if(d instanceof Long) {
>                         sb.append("L");
>                     } else if(d instanceof Float) {
>                         sb.append("F");
>                     }
>                 }
>             } else {
>                 sb.append("");
>             }
>             if (it.hasNext())
>                 sb.append(",");
>         }
>         sb.append(')');
>         return sb.toString();
>     }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message