hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [jira] [Commented] (HIVE-21240) JSON SerDe Re-Write
Date Wed, 27 Feb 2019 03:45:00 GMT


BELUGA BEHR commented on HIVE-21240:

All unit tests are passing [~bslim] [~kgyrtkirk].  Please consider this patch for inclusion
into the project.  I understand there is some hesitation regarding the change in return type.
 Previous a native array was returned and now a Collection (List) is returned by the SerDe.
 I think it's better to work with Java Collections instead of native arrays and if we're going
to change the return value at all, this is an appropriate time to introduce such a change,
i.e., in a major (4.0) release.

> JSON SerDe Re-Write
> -------------------
>                 Key: HIVE-21240
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 4.0.0, 3.1.1
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>         Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, HIVE-21240.10.patch, HIVE-21240.11.patch,
HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, HIVE-21240.3.patch,
HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch,
HIVE-24240.8.patch, kafka_storage_handler.diff
>          Time Spent: 10m
>  Remaining Estimate: 0h
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for each row
processed, for each column in the row

This message was sent by Atlassian JIRA

View raw message