hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-640) Add LazyBinarySerDe to Hive
Date Fri, 31 Jul 2009 00:43:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737380#action_12737380

Zheng Shao commented on HIVE-640:

LazyBinaryByte etc: The copy constructors should copy the value of the data as well. LazyBinaryString/LazyBinaryBoolean
is already doing that but not others.
It's also OK to remove these copy constructors at all, since they are never used.

LazyBinaryArray.java:150: We should get ((ListObjectInspector)oi).getListElementObjectInspector()
and save it to a local variable, so we don't need to call the function again and again.

LazyBinaryUtils.java:295: "Returns the lazy binary object inspector that can be used to translate
an object of that typeInfo to a standard object type. " -> "Returns the lazy binary object
inspector that can be used to inspect an lazy binary object of that typeInfo"

LazyBinaryString.java: The length of the string can be stored in a VInt instead of fixed size
4 bytes. This will help us save some space (especially for short strings).

LazyBinarySerDe.java: Remove the commented code (Line 464). It seems the only difference between
serializing the row and serializing a struct is that we don't need the total number of bytes
at the beginning. Can we add a parameter to serialize(Output byteStream,Object obj, ObjectInspector
objInspector), and let serialize(Object o, OI oi) to call that directly?

> Add LazyBinarySerDe to Hive
> ---------------------------
>                 Key: HIVE-640
>                 URL: https://issues.apache.org/jira/browse/HIVE-640
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>            Assignee: Yuntao Jia
>         Attachments: HIVE-640.1.patch, HIVE-640.2.patch
> LazyBinarySerDe will serialize the data in binary format while supporting LazyDeserialization.
> This will be used as the SerDe for value between map and reduce, and also between different
map-reduce jobs.
> This will help improve the performance of Hive a lot.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message