hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aihua Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-1898) The ESCAPED BY clause does not seem to pick up newlines in colums and the line terminator cannot be changed
Date Mon, 19 Oct 2015 16:35:05 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963578#comment-14963578
] 

Aihua Xu commented on HIVE-1898:
--------------------------------

HIVE-11785 added the support of escaping the newline and carriage return for LazySimpleSerDe
and it should fix this issue. So the intermediate result with LazySimpleSerDe will escape
newline and carriage return and later LineRecordReader can handle each line properly. 

> The ESCAPED BY clause does not seem to pick up newlines in colums and the line terminator
cannot be changed
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1898
>                 URL: https://issues.apache.org/jira/browse/HIVE-1898
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.5.0
>            Reporter: Josh Patterson
>            Priority: Minor
>
> If I want to preserve data in columns which contains a newline (webcrawling for instance)
I cannot set the ESCAPED BY clause to escape these out (other characters such as commas escape
fine, however). This may be due to the line terminators, which are locked to be newlines,
are picked up first, and then fields processed. 
> This seems to be related to:
> "SerDe should escape some special characters"
> https://issues.apache.org/jira/browse/HIVE-136
> and
> "Implement "LINES TERMINATED BY""
> https://issues.apache.org/jira/browse/HIVE-302
> where at comment: https://issues.apache.org/jira/browse/HIVE-302?focusedCommentId=12793435&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793435
> "This is not fixable currently because the line terminator is determined by LineRecordReader.LineReader
which is in the Hadoop land."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message