hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4138) [Hive] refactor the SerDe library
Date Tue, 16 Sep 2008 05:47:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631258#action_12631258

Zheng Shao commented on HADOOP-4138:

Actually I did implement caching for all object inspectors - not just the reflection ones
but also the standard ones.

I had a second look at the code in the factory. All functions except one contains just 10
lines each - which will probably be the same amount as the equal and hashCode function if
we choose to do that instead of doing caching (and people may point out the potential inefficiency
in the recursive implementation of equals and hashCode which can be eliminated by caching
all instances)

The only one long function (ObjectInspectorFactory.getReflectionObjectInspectorNoCache) is
meant to allow ReflectionObjectInspectors to work with recursive types (e.g., linked list
or trees). And for that we have to have caching.

For the developers, the semantics of these functions are also pretty clear from the name.
For the implementation the only tricky point is the recursive thing that we won't be able
to get rid of (unless we don't want to provide the support for recursive types).

So I am not sure whether we could simply the code much, without considering performance.

But I do agree that one thing can be improved. That is the organization of the code: the recursive
logic can be moved into reflection oi, while keeping the common caching part in the factory
and add a signature class so I can merge all caching code together.  I can work on that after
these huge commits (since the code organization is all internal - does not change the APIs).

> [Hive] refactor the SerDe library
> ---------------------------------
>                 Key: HADOOP-4138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4138
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.19.0
>         Attachments: HADOOP-4138-1.txt, HADOOP-4138-2.txt, HADOOP-4138-3.txt, HADOOP-4138-4.txt,
> Hive uses the library from src/contrib/hive/serde to do serialization/deserialization.
> We want to do a refactoring of the library to:
> 1. Split Serializer and Deserializer interface
> 2. Split Serializer/Deserializer and ObjectInspector interface
> 3. Change hive/metaserver and hive/ql to use the new SerDe framework

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message