hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-497) dump does not deal with non-ascii data
Date Thu, 30 Oct 2008 01:08:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Pradeep Kamath updated PIG-497:

    Assignee: Pradeep Kamath  (was: Santhosh Srinivasan)
      Status: Patch Available  (was: Open)

Patch attached.
There were three issues which were resolved:
- DataReaderWriter was using DataOutput.writeBytes(String) instead of DataOutput.writeUTF(String).
Likewise it was using DataInput.readFully(bytes[]) instead of DataInput.readUTF(). The earlier
calls get only lower 8bits out of each character in the string which would mess up multi byte
UTF8 data
- illustrate and dump eventually use System.out.println to output results and System.out.println()
writes bytes in platform default encoding which is typically UTF-16. This was changed to System.write(String.getBytes("UTF-8")

> dump does not deal with non-ascii data
> --------------------------------------
>                 Key: PIG-497
>                 URL: https://issues.apache.org/jira/browse/PIG-497
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>             Fix For: types_branch

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message