hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <>
Subject [jira] Updated: (HIVE-43) [Hive] Port Hive's serialization/deserialization to the new Serialization framework
Date Mon, 01 Dec 2008 17:13:46 GMT


Ashish Thusoo updated HIVE-43:

    Component/s: Serializers/Deserializers

> [Hive] Port Hive's serialization/deserialization to the new Serialization framework
> -----------------------------------------------------------------------------------
>                 Key: HIVE-43
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Pete Wyckoff
> Problem 1: legacy data
> This is non-trivial because of legacy Hive data which is written as BytesWritable in
the SequenceFile value key.  The specific RecordIO/Thrift/X class name is stored in the metastore.

> If we write our own SequenceFileRecordReader, this is trivial, but the standard reader
assumes the SequenceFile has the actual class name and thus we cannot  deserialize at this
level as we would just get back bytes writable. We need the SequenceFileRecordReader to consult
the Deserializer as to what the actual class being deserialized is.
> I don't know if this is a common problem of writing data as just byteswritable and storing
the real class somewhere else, but for us it is an issue.
> Otherwise, there's soon to be a ThriftSerialization set of classes and we can add ones
for our other serdes.
> Problem 2: DynamicSerDe
> This is a serializer/deserializer that takes a thrift DDL at *runtime* and can serialize/deserialize
thrift/non thrift data.  Thus, the class name DynamicSerDe doesn't give you what you need,
namely the DDL and the protocol used for the serialization - Binary or Control Separated.
(in theory json, xml, ...)  
> We can store this DDL in the metastore (and we do), but then DynamicSerDe must be used
only with Hive.  Maybe we should output only to TFiles where we could put the DDL in the metadata
for the file.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message