hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-553) Add LazyBinarySerDe to Hive
Date Wed, 10 Jun 2009 04:11:07 GMT
Add LazyBinarySerDe to Hive
---------------------------

                 Key: HIVE-553
                 URL: https://issues.apache.org/jira/browse/HIVE-553
             Project: Hadoop Hive
          Issue Type: New Feature
    Affects Versions: 0.4.0
            Reporter: Zheng Shao


Currently the most popular SerDe in Hive is LazySimpleSerDe. LazySimpleSerDe has the benefit
of being simple (use text format to store data), but its performance may suffer in the following
cases:
1. For double values, we are storing them in text format which is very space-inefficient,
and both serialization and deserialization are slow;
2. For complex type of columns that contains a lot of levels, we are scanning the buffer once
per level, which is very inefficient.

We should add a binary serde format that stores the data in binary format. The format should
have the following properties:
1. Compact: it should be space-efficient;
2. Fast: it should be efficiently to deserialize the data, especially for double values and
complex types.
3. It should support serializing NULL values.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message