hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-136) SerDe should escape some special characters
Date Wed, 21 Jan 2009 07:06:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665737#action_12665737
] 

Zheng Shao commented on HIVE-136:
---------------------------------

Proposal:

1. For serialization:
* \ -> \\
* newline -> \n
* carriage return -> \r

The following characters are escaped only if they are column/item/key separators or quotations:
* null character -> \000 (octal number)
* ^A -> \001
* ^B -> \002
...
* tab -> \t
...


2. For deserialization:
* \\ -> \
* \n -> newline
* \r -> carriage return
* \xxx (where xxx are octal number from 000 to 177 (127 in decimal) )
* \0 -> null character
* \ (without a match above) -> \


In this proposal, we don't support quotation (" and '). Quotation will be required to read
mysql/oracle dumped data, but I hope to address it in a different serde since we need to distinguish
the 


> SerDe should escape some special characters
> -------------------------------------------
>
>                 Key: HIVE-136
>                 URL: https://issues.apache.org/jira/browse/HIVE-136
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Zheng Shao
>            Priority: Critical
>
> MetadataTypedColumnsetSerDe and DynamicSerDe should escape some special characters like
'\n' or the column/item/key separator.
> Otherwise the data will look corrupted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message