hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-560) UTFDataFormatException (encoded string too long) is thrown when storing strings > 65536 bytes (in UTF8 form) using BinStorage()
Date Thu, 11 Dec 2008 19:18:44 GMT
UTFDataFormatException (encoded string too long) is thrown when storing strings > 65536
bytes (in UTF8 form) using BinStorage()
-------------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-560
                 URL: https://issues.apache.org/jira/browse/PIG-560
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
             Fix For: types_branch


BinStorage() uses DataOutput.writeUTF() and DataInput.readUTF() Java API to write out Strings
as UTF-8 bytes and to read them back. From the Javadoc - "First, the total number of bytes
needed to represent all the characters of s is calculated. If this number is larger than 65535,
then a UTFDataFormatException  is thrown. " (because the writeUTF() API uses 2 bytes to represent
the number of bytes). A way to get around this would be to not use writeUTF()/ReadUTF() and
instead hand convert the string to the corresponding UTF-8 byte[]  (using String.getBytes("UTF-8")
and then write the length of the byte array as an int - this will allow a size of upto 2^32
(2 raised to 32).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message