hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ciemiewicz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-771) PigDump does not properly output Chinese UTF8 characters - they are displayed as question marks ??
Date Mon, 27 Apr 2009 16:22:30 GMT

    [ https://issues.apache.org/jira/browse/PIG-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703195#action_12703195
] 

David Ciemiewicz commented on PIG-771:
--------------------------------------

Very strange.  I can display UTF8 chinese characters in my Mac OS Terminal window.  Only dump
has a problem.

Here's the transcript of what I did.  If you look, you'll see:

{code}
-bash-3.00$ cat > ch.txt
中文测试

-bash-3.00$ file ch.txt
ch.txt: UTF-8 Unicode text

-bash-3.00$ cat ch.txt
中文测试

-bash-3.00$ cat ch.pig
A = load 'ch.txt' using PigStorage() as (str: chararray);
dump A;
store A into 'ch.out' using PigStorage();

-bash-3.00$ pig -exectype local ch.pig
USING: /grid/0/gs/pig/current
2009-04-27 16:15:16,314 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- 100% complete!
2009-04-27 16:15:16,315 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- Success!!
(????)
2009-04-27 16:15:16,339 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- 100% complete!
2009-04-27 16:15:16,339 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- Success!!

-bash-3.00$ cat ch.out
中文测试

-bash-3.00$ pig -exectype local 
USING: /grid/0/gs/pig/current
grunt> A = load 'ch.txt' using PigStorage() as (str: chararray);
grunt> dump A;
2009-04-27 16:16:51,786 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- 100% complete!
2009-04-27 16:16:51,786 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- Success!!
(????)
grunt> 
{code}

> PigDump does not properly output Chinese UTF8 characters - they are displayed as question
marks ??
> --------------------------------------------------------------------------------------------------
>
>                 Key: PIG-771
>                 URL: https://issues.apache.org/jira/browse/PIG-771
>             Project: Pig
>          Issue Type: Bug
>            Reporter: David Ciemiewicz
>
> PigDump does not properly output Chinese UTF8 characters.
> The reason for this is that the function Tuple.toString() is called.
> DefaultTuple implements Tuple.toString() and it calls Object.toString() on the opaque
object d.
> Instead, I think that the code should be changed instead to call the new DataType.toString()
function.
> {code}
>     @Override
>     public String toString() {
>         StringBuilder sb = new StringBuilder();
>         sb.append('(');
>         for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
>             Object d = it.next();
>             if(d != null) {
>                 if(d instanceof Map) {
>                     sb.append(DataType.mapToString((Map<Object, Object>)d));
>                 } else {
>                     sb.append(DataType.toString(d));  // <<< Change this one
line
>                     if(d instanceof Long) {
>                         sb.append("L");
>                     } else if(d instanceof Float) {
>                         sb.append("F");
>                     }
>                 }
>             } else {
>                 sb.append("");
>             }
>             if (it.hasNext())
>                 sb.append(",");
>         }
>         sb.append(')');
>         return sb.toString();
>     }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message