hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumit Kumar <ksumi...@gmail.com>
Subject Re: XML Serde
Date Mon, 25 Jun 2012 09:20:49 GMT
So i found this discussion on this topic
http://mail-archives.apache.org/mod_mbox/hive-user/201006.mbox/%3CAANLkTikYL3HinOwFO36YEYId9VOJyH_6pe3slORHyKWI@mail.gmail.com%3E.
Makes more sense now. Will post my final resolution.

On Sun, Jun 24, 2012 at 10:39 PM, Sumit Kumar <ksumitus@gmail.com> wrote:

> Hi,
>
> So i looked for a generic approach for handling xml files in hive but
> found none and thought i could use the concepts from json-serde (
> http://code.google.com/p/hive-json-serde/) in creating a generic xml
> serde. XPath was something that came immediately in my mind and should work
> in the same way that json works for json-serde. The problem is with the use
> case that one xml file could contain multiple rows of interest in a single
> xml file. Example shown below.
>
> <root>
>  <book> ... </book>
>  <book> ... </book>
>  <book> ... </book>
> </root>
>
> In this case, serde is supposed to generate three rows for each book node.
> I looked at json-serde implementation but there the deserialize step
> returns an ArrayList instance with column values set in indices of the
> ArrayList; and this one instance maps to one row. I do see that deserialize
> step can return any java Object but not sure what would be the appropriate
> way to return multiple rows corresponding to each book node. I'm going to
> give it a shot anyway but thought to seek help from the community if
> somebody has already tried this or has a better approach. Would really
> appreciate any input, if i succeed, i will share my code; if not, i will
> anyway come back :-)
>
> Thanks in advance.
> -Sumit
>

Mime
View raw message