hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-136) SerDe should escape some special characters
Date Fri, 23 Jan 2009 17:53:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666627#action_12666627

Joydeep Sen Sarma commented on HIVE-136:

for #1 - so if the row separator can be other than newline - i am guessing u want to generalize
to escaping to whatever the row separator is (as opposed to specifically escaping newlines)
- right?

also - if there is a literal '\n' sequence in the original string - are we going to convert
into '\\n' during serialization and then match '\\' back in deserialization (so as to leave
'\n' back in deserialized string). generalize to other separators as well of course.

#2 - I am uncomfortable unescaping more than what we escape. for example - let's say there's
a \XXX symbol in some text file. Now we need to feed it into a transform script. we unescape
when reading the text file - but we do not escape it back when sending to script. So the user
script does not see the raw data in the file. 

Presumably - if the user had put \XXX in the file - then they knew how to handle it in their
scripts and we are needlessly tampering with this data.

if more aggressive unescaping is required  - we can always provide unescape UDFs. That would
be much better since we could have some standard semantics for the unescaping (json/html/xml

> SerDe should escape some special characters
> -------------------------------------------
>                 Key: HIVE-136
>                 URL: https://issues.apache.org/jira/browse/HIVE-136
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Zheng Shao
>            Priority: Critical
> MetadataTypedColumnsetSerDe and DynamicSerDe should escape some special characters like
'\n' or the column/item/key separator.
> Otherwise the data will look corrupted.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message