avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ran S <r...@liveperson.com>
Subject using Avro unions with HIVE
Date Thu, 23 May 2013 14:15:23 GMT
Hi,
We started to work with Avro in CDH4 and to query the Avro files using Hive.
This does work fine for us, except for unions.
We do not understand how to query the data inside a union using Hive.

For example, let's look at the following schema:

{
	"type":"record", 
	"name":"event", 
	"namespace":"com.mysite",
	"fields":[
    {
        "name":"header",
        "type":{
            "type":"record", "name":"CommonHeader",
            "fields":[{ "name":"eventTimeStamp", "type":"long", efault":-1
},
                      { "name":"globalUserId", "type":["null", "string"],
"default":null } ]
        },
        "default":null
    },
    {
        "name":"eventbody",
        "type":{
            "type":"record", "name":"eventbody",
            "fields":[
                {
                    "name":"body",
                    "type":[
                       "null", 
                       {
                        "type":"record",
                        "name":"event1",
                        "fields":[
                            {
                                "name":"event1Header", 
                                "type":["null", { "type":"array",
"items":"string" }], "default":null
                            },
                            {
                                "name":"event1Body",
                                "type":["null", { "type":"array",
"items":"string" }], "default":null
                            }
                        ]
                    }, 
                   {
                        "type":"record",
                        "name":"event2",
                        "fields":[
                            {
                                "name":"page",
                                "type":{
                                    "type":"record", "name":"URL",
"fields":[{ "name":"url", "type":"string" }]
                                },
                                "default":null
                            },
                            {
                                "name":"referrer", "type":"string",
"default":null
                            }
                        ]
                    }
		],
                    "default":null
                }
            ]
        },
        "default":null
    }
]}

Note that "body" is a union of three types:
null, "event1" and "event2"

So if I want to query fields inside event1, I first need to access it.
I then set a HiveQL like this:
SELECT eventbody.body.??? from SRC

My question is: what shoule I put in the ??? above to make this work?

Thank you,
Ran



--
View this message in context: http://apache-avro.679487.n3.nabble.com/using-Avro-unions-with-HIVE-tp4027473.html
Sent from the Avro - Users mailing list archive at Nabble.com.

Mime
View raw message