hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Rovner (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-895) Add SerDe for Avro serialized data
Date Sun, 18 Jul 2010 04:13:53 GMT

    [ https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889558#action_12889558

Alex Rovner commented on HIVE-895:

Can some one please explain to me how would this serde work?

Specifically how would it deserialize the data?

>From what I understand Avro file has a header that defines the data that is stored in
the file. In order to deserialize the data you need to read the header which is a challenge
in Hive's Deserialize interface because the initialize() method does not know anything about
the input file. (Note: there is a hack that can get you the file by getting the map.input
hadoop property.... this hack however is not good enough in hive because some one might be
using the CLI to query which will not trigger a map reduce job.

Does anyone know a good solution to this issue?

I am actually trying to implements a different file format but the idea of our format is similar
to Avro: Each file has a header in which it contains a "schema"


> Add SerDe for Avro serialized data
> ----------------------------------
>                 Key: HIVE-895
>                 URL: https://issues.apache.org/jira/browse/HIVE-895
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Jeff Hammerbacher
> As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro data seems
like a solid win.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message