hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuntao Jia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-640) Add LazyBinarySerDe to Hive
Date Tue, 21 Jul 2009 22:37:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733866#action_12733866
] 

Yuntao Jia commented on HIVE-640:
---------------------------------

Changes need to be made to the source code:

1/ Create "LazyBinary" classes for all data types. It will be a hierarchy of classes which
is similar to the hierarchy of the "Lazy" classes. 
    The most top level class is LazyBinaryObject which has two child classes, LazyBinaryPrimitive
and LazyBinaryNonPrimitive. 
    LazyBinaryPrimitive has several child classes defined for all primitive types, such as
LazyBinaryBoolean for Boolean, LazyBinaryInteger 
    for Integer and so on. LazyBinaryNonPrimitive has 4 child classes for 4 complex data types
in Hive, including  LazyBinaryStruct, 
    LazyBinaryMap, LazyBinaryList and LazyBinaryString. The reason of having LazyBinaryPrimitive
and its child classes is that we 
    can easily define a LazyBinaryList as a list of LazyPrimitive objects.
2/ Create LazyBinary object inspectors for some of the LazyBinary classes that handle complex
data types. They are 
     LazyBinaryStructObjectInspector, LazyBinaryMapObjectInspector, LazyBinaryListObjectInspector
and LazyBinaryStringObjectInspector.
3/ Create the LazyBinarySerDe class which serializes a datastream to LazyBinary format and
later deserialize them back to LazyBinary types. 

> Add LazyBinarySerDe to Hive
> ---------------------------
>
>                 Key: HIVE-640
>                 URL: https://issues.apache.org/jira/browse/HIVE-640
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>            Assignee: Yuntao Jia
>
> LazyBinarySerDe will serialize the data in binary format while supporting LazyDeserialization.
> This will be used as the SerDe for value between map and reduce, and also between different
map-reduce jobs.
> This will help improve the performance of Hive a lot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message