hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alon Goldshuv (JIRA)" <>
Subject [jira] [Commented] (HIVE-7777) Add CSV Serde based on OpenCSV
Date Thu, 06 Nov 2014 10:24:34 GMT


Alon Goldshuv commented on HIVE-7777:

While the serde works fine, it has an issue, which is quite serious IMO - It forces all the
column types to String. This means that running a query on data that isn't all string type
can return wrong query results. In the unit tests I see a single example of a table using
all string columns, and in the tests linked here there are many tables with non-string types,
but all the queries seem to be simple COUNT(*), which won't catch the problem.

Consider the following example:

CREATE EXTERNAL TABLE test (totalprice DECIMAL(38,10)) 
ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' with 
serdeproperties ("separatorChar" = ",","quoteChar"= "'","escapeChar"= "\\") 
LOCATION '<some location>' 
tblproperties ("skip.header.line.count"="1");

Now consider this sql:

hive> select min(totalprice) from test;

in this case given my data, the result should have been 874.89, but the actual result became
100001.57 (as it is first according to byte ordering of a string type). this is a wrong result.

hive> desc extended test;
o_totalprice        	string              	from deserializer

I apologize if it's a false alarm and I'm misusing the DDL somehow. Otherwise - this is a
concern as wrong query results is a bad thing...

> Add CSV Serde based on OpenCSV
> ------------------------------
>                 Key: HIVE-7777
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Ferdinand Xu
>            Assignee: Ferdinand Xu
>              Labels: TODOC14
>             Fix For: 0.14.0
>         Attachments: HIVE-7777.1.patch, HIVE-7777.2.patch, HIVE-7777.3.patch, HIVE-7777.patch,
> There is no official support for csvSerde for hive while there is an open source project
in github( CSV is of high frequency in use as a data

This message was sent by Atlassian JIRA

View raw message