hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/DeveloperGuide" by ZhengShao
Date Wed, 07 Jan 2009 03:34:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by ZhengShao:

    * DynamicSerDe: This serde also read/write thrift serialized objects, but it understands
thrift DDL so the schema of the object can be provided at runtime.  Also it supports a lot
of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which
writes data in delimited records).
  How to write your own SerDe:
-   * In most cases, users want to write a Deserializer instead of a SerDe.
+   * In most cases, users want to write a Deserializer instead of a SerDe, because users
just want to read their own data format instead of writing to it.
    * For example, the RegexDeserializer will deserialize the data using the configuration
parameter 'regex', and possibly a list of column names (see serde2.MetadataTypedColumnsetSerDe).
Please see serde2/Deserializer.java for details.
+   * If your SerDe supports DDL (basically, SerDe with parameterized columns and column types),
you probably want to implement a Protocol based on DynamicSerDe, instead of writing a SerDe
from scratch. The reason is that the framework passes DDL to SerDe through "thrift DDL" format,
and it's non-trivial to write a "thrift DDL" parser.
+ Some important points of SerDe:
+   * SerDe, not the DDL, defines the table schema. Some SerDe implementations use the DDL
for configuration, but SerDe can also override that.
+   * Column types can be arbitrarily nested arrays, maps and structures.
+   * The callback design of ObjectInspector allows lazy deserialization with CASE/IF or when
using complex or nested types.
  === MetaStore ===

View raw message