hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ferdinand Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8763) Support for use of enclosed quotes in LazySimpleSerde
Date Thu, 27 Nov 2014 01:22:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227096#comment-14227096
] 

Ferdinand Xu commented on HIVE-8763:
------------------------------------

Hi [~rstokes], can you please create a review board entry for your patch?

> Support for use of enclosed quotes in LazySimpleSerde
> -----------------------------------------------------
>
>                 Key: HIVE-8763
>                 URL: https://issues.apache.org/jira/browse/HIVE-8763
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
>         Environment: many - verified on Centos / Redhat with CDH
>            Reporter: ronan stokes
>         Attachments: HIVE-8763.1.patch
>
>
> Currently the LazySimpleSerde does not support the use of quotes for delimited fields
to allow use of separators within a quoted field - this means having to use alternatives for
many common use cases for CSV style data. 
> Key scenarios that do not work include:
> (3 column row for int, string, float delimited by ',')
> 100,"3.5 inch hard drive, quantity 10",2650.30
> 100,"3.5 \" hard drive, quantity 10",2650.30
> 100,  "3.5 "" hard drive, quantity 10",  2650.30
> 100,"3.5 "" hard drive, quantity 10",2650.30
> There are a number of fixes that I have implemented support in the deserialization stage
to a copy of the Lazy simple serde to address this:
> For serialization, the code is unchanged with the relevant embedded characters being
escaped.
> Assuming a row with 3 fields - SKU ID, description, price, delimited by ','
> 1) allow use of enclosed quotes around a string field 
> For example 
> 100,"3.5 inch hard drive, quantity 10",2650.30
> 2) support escaping of quotes within field to allow use of embedded quote
> 100,"3.5 \" hard drive, quantity 10",2650.30
> 3) support for old style CSV embedded quotes 
> for example 
> 100,"3.5 "" hard drive, quantity 10",2650.30
> 4) support for skipping of leading spaces in field
> For example (note space between first ',' and opening quote)
> 100,  "3.5 "" hard drive, quantity 10",  2650.30
> In each case, with the changes these are evaluated as though the delimiters and embedded
quotes were escaped:
> e.g
> 100, 3.5 \" hard drive\, quantity 10,  2650.30
> All of these are enabled or disabled using serde properties for quotechar, whether enclosed
quotes is supported, whether double embedded quotes are treated as single quote (of same char
type)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message