incubator-hcatalog-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <>
Subject [jira] [Updated] (HCATALOG-49) Support Avro Data File Format in HCatalog
Date Mon, 20 Jun 2011 05:17:50 GMT


Tom White updated HCATALOG-49:

    Attachment: HCATALOG-49.patch

Here is an initial attempt to support Avro in HCatalog.

Some notes:

* For output, an Avro schema is computed for the HCatalog schema by the Avro output storage
driver. The current patch does not allow you to specify a custom Avro schema - this would
be a natural extension.
* Avro map keys must be strings, wheres they can be any type in HCatalog. The current implementation
assumes that HCatalog maps have string types, and fails if this is not true. It might be possible
to relax this restriction in the future by doing type conversion. 
* In HCatalog, values can be null, whereas this is not true for simple schemas in Avro. It
would be possible to generate null unions in Avro, but this isn't done here. This could be
a future enhancement.
* For the Avro input storage driver, the Avro schema in the Avro Data File is checked for
compatibility with the HCatalog schema, and an exception is thrown if there's a mismatch.
* Byte arrays can not be represented in HCatalog, so there is no way to read byte arrays from
Avro files. (Pig has the same limitation.)

> Support Avro Data File Format in HCatalog
> -----------------------------------------
>                 Key: HCATALOG-49
>                 URL:
>             Project: HCatalog
>          Issue Type: New Feature
>            Reporter: Tom White
>         Attachments: HCATALOG-49.patch
> Add input and output drivers for Avro.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message