hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ciemiewicz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-771) PigDump does not properly output Chinese UTF8 characters - they are displayed as question marks ??
Date Mon, 20 Apr 2009 16:46:47 GMT

    [ https://issues.apache.org/jira/browse/PIG-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700845#action_12700845
] 

David Ciemiewicz commented on PIG-771:
--------------------------------------

I was going to submit a patch for this one line change, but I discovered in compiling the
code that DataType.toString(d) throws an ExecException.

Oddly, DataType.mapToString DOES NOT throw any Exceptions which is inconsistent with the other
DataType.to... functions.

I am not sure how to best implement the try / catch / throw for this particular case.

Also, in doing the code review of DataType.mapToString(...) I discovered that it will also
have problems with correctly dumping the data contained within it because it too uses Object.toString()
on opaque data handles.

So, the code for DataType.mapToString(...) should also use DataType.toString(Object);

But now I witness a recursion problem.  DataType.toString(Object) does not work for complex
types.  So maps of maps will not be recursed properly.

So DataType.toString(Object) should probably be enhanced to work on Maps as well.

But now we have another problem ... PigDump wants to append L and F for Long values and Float
values.  But this won't work for nested structures.

> PigDump does not properly output Chinese UTF8 characters - they are displayed as question
marks ??
> --------------------------------------------------------------------------------------------------
>
>                 Key: PIG-771
>                 URL: https://issues.apache.org/jira/browse/PIG-771
>             Project: Pig
>          Issue Type: Bug
>            Reporter: David Ciemiewicz
>
> PigDump does not properly output Chinese UTF8 characters.
> The reason for this is that the function Tuple.toString() is called.
> DefaultTuple implements Tuple.toString() and it calls Object.toString() on the opaque
object d.
> Instead, I think that the code should be changed instead to call the new DataType.toString()
function.
> {code}
>     @Override
>     public String toString() {
>         StringBuilder sb = new StringBuilder();
>         sb.append('(');
>         for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
>             Object d = it.next();
>             if(d != null) {
>                 if(d instanceof Map) {
>                     sb.append(DataType.mapToString((Map<Object, Object>)d));
>                 } else {
>                     sb.append(DataType.toString(d));  // <<< Change this one
line
>                     if(d instanceof Long) {
>                         sb.append("L");
>                     } else if(d instanceof Float) {
>                         sb.append("F");
>                     }
>                 }
>             } else {
>                 sb.append("");
>             }
>             if (it.hasNext())
>                 sb.append(",");
>         }
>         sb.append(')');
>         return sb.toString();
>     }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message