hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11996) Row Delimiter other than '\n' throws error in Hive.
Date Wed, 30 Sep 2015 13:41:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sameer Gupta updated HIVE-11996:
--------------------------------
    Summary: Row Delimiter other than '\n' throws error in Hive.  (was: CLONE - Allow other
characters for LINES TERMINATED BY )

> Row Delimiter other than '\n' throws error in Hive.
> ---------------------------------------------------
>
>                 Key: HIVE-11996
>                 URL: https://issues.apache.org/jira/browse/HIVE-11996
>             Project: Hive
>          Issue Type: Bug
>          Components: Beeline, Database/Schema, Hive
>    Affects Versions: 0.12.0
>            Reporter: Sameer Gupta
>            Assignee: Ashutosh Chauhan
>            Priority: Critical
>              Labels: DDL, Delimiter, Hive, Line,, SerDe
>
> Error Code and Error Text:
>         " LINES TERMINATED BY only supports newline '\n' right now. Error encountered
near token ''\u0001'' (state=42000,code=40000)"
> Issue Discription:
> Hive Language Manual States that Changing the Line Delimeter is Possible.
> row_format
>   : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED
BY char]
>         [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
>         [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
>   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value,
...)]
> Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
> But on defining the [LINES TERMINATED BY char], an error stating hive only supports newline
'\n' right now is encountered. Whcih essentially means that the choice of new line character
is static. Why does this come as a a configurable item in the DDL is unclear.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> Impact:
> While storing freform data such as Email or Comments, it is fairly common to have a '\n'
character crop up. A lot of free form ETL on Linux using majority of ETL tools also adds a
$ (new line character) to maintain formating. 
> As the Hive Language manual shows this as a configurable property, it also leads to misleading
solution designs which fail when the create statement is triggered in the development phase.
> having the ability to choose your row delimiter is a very basic necessacity and it is
alarming the this is not supported till Hive 14 to the best of mu knowledge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message