hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumit Kumar <>
Subject Re: XML Serde
Date Mon, 25 Jun 2012 09:20:49 GMT
So i found this discussion on this topic
Makes more sense now. Will post my final resolution.

On Sun, Jun 24, 2012 at 10:39 PM, Sumit Kumar <> wrote:

> Hi,
> So i looked for a generic approach for handling xml files in hive but
> found none and thought i could use the concepts from json-serde (
> in creating a generic xml
> serde. XPath was something that came immediately in my mind and should work
> in the same way that json works for json-serde. The problem is with the use
> case that one xml file could contain multiple rows of interest in a single
> xml file. Example shown below.
> <root>
>  <book> ... </book>
>  <book> ... </book>
>  <book> ... </book>
> </root>
> In this case, serde is supposed to generate three rows for each book node.
> I looked at json-serde implementation but there the deserialize step
> returns an ArrayList instance with column values set in indices of the
> ArrayList; and this one instance maps to one row. I do see that deserialize
> step can return any java Object but not sure what would be the appropriate
> way to return multiple rows corresponding to each book node. I'm going to
> give it a shot anyway but thought to seek help from the community if
> somebody has already tried this or has a better approach. Would really
> appreciate any input, if i succeed, i will share my code; if not, i will
> anyway come back :-)
> Thanks in advance.
> -Sumit

View raw message