hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alon Goldshuv (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7777) Add CSV Serde based on OpenCSV
Date Sun, 09 Nov 2014 13:25:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203915#comment-14203915
] 

Alon Goldshuv commented on HIVE-7777:
-------------------------------------

Either way should work (adding OpenCSV parsing on LazySimpleSerde or adding type support on
this new CSV serde). 

IMO the deciding factor should be performance considerations. If adding quote stripping to
LazySimpleSerde means it will slow down simple non quoted parsing (e.g, due to introducing
the need to examine the state after each byte instead of seeking fast to the next line terminator)
- I'd say the solution is best represented in 2 separate serdes (as proposed in this JIRA).
If that isn't the case though - a single serde (as proposed by [~rstokes]) is more elegant/friendly.
[~rstokes] - can you share information on that respect, or share the code for your modified
LazySimpleSerde?

> Add CSV Serde based on OpenCSV
> ------------------------------
>
>                 Key: HIVE-7777
>                 URL: https://issues.apache.org/jira/browse/HIVE-7777
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Ferdinand Xu
>            Assignee: Ferdinand Xu
>              Labels: TODOC14
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7777.1.patch, HIVE-7777.2.patch, HIVE-7777.3.patch, HIVE-7777.patch,
csv-serde-master.zip
>
>
> There is no official support for csvSerde for hive while there is an open source project
in github(https://github.com/ogrodnek/csv-serde). CSV is of high frequency in use as a data
format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message