hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ronan stokes (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-8763) Support for use of enclosed quotes in LazySimpleSerde
Date Thu, 06 Nov 2014 21:14:33 GMT
ronan stokes created HIVE-8763:
----------------------------------

             Summary: Support for use of enclosed quotes in LazySimpleSerde
                 Key: HIVE-8763
                 URL: https://issues.apache.org/jira/browse/HIVE-8763
             Project: Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
    Affects Versions: 0.13.1, 0.13.0, 0.12.0, 0.11.0
         Environment: many - verified on Centos / Redhat with CDH
            Reporter: ronan stokes


Currently the LazySimpleSerde does not support the use of quotes for delimited fields to allow
use of separators within a quoted field - this means having to use alternatives for many common
use cases for CSV style data. 

Key scenarios that do not work include:
(3 column row for int, string, float delimited by ',')
100,"3.5 inch hard drive, quantity 10",2650.30
100,"3.5 \" hard drive, quantity 10",2650.30
100,  "3.5 "" hard drive, quantity 10",  2650.30
100,"3.5 "" hard drive, quantity 10",2650.30


There are a number of fixes that I have implemented support in the deserialization stage to
a copy of the Lazy simple serde to address this:

For serialization, the code is unchanged with the relevant embedded characters being escaped.

Assuming a row with 3 fields - SKU ID, description, price, delimited by ','

1) allow use of enclosed quotes around a string field 
For example 

100,"3.5 inch hard drive, quantity 10",2650.30

2) support escaping of quotes within field to allow use of embedded quote
100,"3.5 \" hard drive, quantity 10",2650.30

3) support for old style CSV embedded quotes 
for example 

100,"3.5 "" hard drive, quantity 10",2650.30

4) support for skipping of leading spaces in field
For example

100,  "3.5 "" hard drive, quantity 10",  2650.30

In each case, these are evaluated as 

All of these are enabled or disabled using serde properties for quotechar, whether enclosed
quotes is supported, whether double embedded quotes are treated as single quote (of same char
type)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message