hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HIVE-337) LazySimpleSerDe should support array and map types
Date Tue, 24 Mar 2009 20:59:50 GMT

    [ https://issues.apache.org/jira/browse/HIVE-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688599#action_12688599
] 

Zheng Shao edited comment on HIVE-337 at 3/24/09 1:58 PM:
----------------------------------------------------------

Done with all the comments except 6.

I also renamed the setAll() function to init() to make it clearer.

Because we now pass TypeInfo around in LazyObject hierarchy, we don't even need to create
the LazyObject for an array element if that element is never accessed (we can create it on
demand when it's accessed).

The current code works fine without the change of 6.  The change of 6 requires either 12 bytes
more storage per primitive object (by adding the byte[], int, int to the LazyPrimitive), or
more complicated logic in removing the int start and int length from LazyNonPrimitive (we
will have to parse the data right in init(..) but we don't have access to the separators because
it's in the next-level ObjectInspectors - unless we add the pointers from LazyObject to ObjectInspector,
but that's another overhead and complicates the data structure).

After all, the implementation of init() is private to the class and I don't think there is
a strong need to make the implementation the same across LazyPrimitive and LazyNonPrimitive.
The fact that the parsing of LazyPrimitive does not require delimiters and LazyNonPrimitive
requires is good enough for them to have different implementations.


Future improvements include:
1. Support escaping: HIVE-136;
2. Columnar storage: HIVE-352;
3. Use Writable/Text for values: HIVE-266;
4. Short-circuit serialization: HIVE-358;
5. Short-circuit expression evaluation: HIVE-359.
6. Common expression evaluation: HIVE-364


      was (Author: zshao):
    Done with all the comments except 6.

I also renamed the setAll() function to init() to make it clearer.

Because we now pass TypeInfo around in LazyObject hierarchy, we don't even need to create
the LazyObject for an array element if that element is never accessed (we can create it on
demand when it's accessed).

The current code works fine without the change of 6.  The change of 6 requires either 12 bytes
more storage per primitive object (by adding the byte[], int, int to the LazyPrimitive), or
more complicated logic in removing the int start and int length from LazyNonPrimitive (we
will have to parse the data right in init(..) but we don't have access to the separators because
it's in the next-level ObjectInspectors - unless we add the pointers from LazyObject to ObjectInspector,
but that's another overhead and complicates the data structure).

After all, the implementation of init() is private to the class and I don't think there is
a strong need to make the implementation the same across LazyPrimitive and LazyNonPrimitive.
The fact that the parsing of LazyPrimitive does not require delimiters and LazyNonPrimitive
requires is good enough for them to have different implementations.


Future improvements include:
1. Support escaping: HIVE-136;
2. Columnar storage: HIVE-352;
3. Use Writable/Text for values: HIVE-266;
4. Short-circuit serialization: HIVE-358;
5. Short-circuit expression evaluation: HIVE-359.

  
> LazySimpleSerDe should support array and map types
> --------------------------------------------------
>
>                 Key: HIVE-337
>                 URL: https://issues.apache.org/jira/browse/HIVE-337
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Blocker
>         Attachments: HIVE-337.1.patch, HIVE-337.2.patch, HIVE-337.5.patch
>
>
> Once we do that, we can completely deprecate DynamicSerDe/TCTLSeparatedProtocol, and
close any bugs that DynamicSerDe/TCTLSeparatedProtocol has.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message