hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Weeks (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (HIVE-14867) "serialization.last.column.takes.rest" does not work for MultiDelimitSerDe
Date Tue, 17 Oct 2017 13:50:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shawn Weeks updated HIVE-14867:
-------------------------------
    Comment: was deleted

(was: Ran across this issue troubleshooting for a customer. This essentially makes this serde
useless as it's always going to throw garbage in the last column. Is there a reason we can't
just add multi character field delimiters to other text serde and deprecate this one as it
doesn't appear to be getting maintained.)

> "serialization.last.column.takes.rest" does not work for MultiDelimitSerDe
> --------------------------------------------------------------------------
>
>                 Key: HIVE-14867
>                 URL: https://issues.apache.org/jira/browse/HIVE-14867
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 1.3.0
>            Reporter: Niklaus Xiao
>            Assignee: Niklaus Xiao
>
> Create table with MultiDelimitSerde:
> {code}
> CREATE TABLE foo (a string, b string) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ("field.delim"="|@|","collection.delim"=":","mapkey.delim"="@") stored
as textfile;
> {code}
> load data into table:
> {code}
> 1|@|Lily|@|HW|@|abc
> 2|@|Lucy|@|LX|@|123
> 3|@|Lilei|@|XX|@|3434
> {code}
> select data from this table:
> {code}
> select * from foo;
> +---------+----------------+--+
> | foo.a  |     foo.b     |
> +---------+----------------+--+
> | 1       | Lily^AHW^Aabc    |
> | 2       | Lucy^ALX^A123    |
> | 3       | Lilei^AXX^A3434  |
> +---------+----------------+--+
> 3 rows selected (0.905 seconds)
> {code}
> You can see the last column takes all the data, and replace the delimiter to default
^A.
> lastColumnTakesRestString should be false by default: 
> {code}
>     String lastColumnTakesRestString = tbl
>         .getProperty(serdeConstants.SERIALIZATION_LAST_COLUMN_TAKES_REST);
>     lastColumnTakesRest = (lastColumnTakesRestString != null && lastColumnTakesRestString
>         .equalsIgnoreCase("true"));
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message