Hello, 
   I'm new to Hadoop. 
   I have a large quantity of JSON documents with a structure similar to what is shown below.  

   {
     g : "some-group-identifier",
     sg: "some-subgroup-identifier",
     j      : "some-job-identifier",
     page     : 23,
     ... // other fields omitted
     important-data : [
         {
           f1  : "abc",
           f2  : "a",
           f3  : "/"
           ...
         },
         ...
         {
           f1 : "xyz",
           f2  : "q",
           f3  : "/",
           ... 
         },
     ],
    ... // other fields omitted 
     other-important-data : [
        {
           x1  : "ford",
           x2  : "green",
           x3  : 35
           map : {
               "free-field" : "value",
               "other-free-field" : value2"
              }
         },
         ...
         {
           x1 : "vw",
           x2  : "red",
           x3  : 54,
           ... 
         },    
     ]
   },
}
 

Each file contains a single JSON document (gzip compressed, and roughly about 200KB uncompressed of pretty-printed json text per document)

I am interested in analyzing only the  "important-data" array and the "other-important-data" array.
My source data would ideally be easier to analyze if it looked like a couple of tables with a fixed set of columns. Only the column "map" would be a complex column, all others would be primitives.

( g, sg, j, page, f1, f2, f3 )
 
( g, sg, j, page, x1, x2, x3, map )

So, for each JSON document, I would like to "create" several rows, but I would like to avoid the intermediate step of persisting -and duplicating- the "flattened" data.

In order to avoid persisting the data flattened, I thought I had to write my own map-reduce in Java code, but discovered that others have had the same problem of using JSON as the source and there are somewhat "standard" solutions. 

By reading about the SerDe approach for Hive I get the impression that each JSON document is transformed into a single "row" of the table with some columns being an array, a map of other nested structures. 
a) Is there a way to break each JSON document into several "rows" for a Hive external table?
b) It seems there are too many JSON SerDe libraries! Is there any of them considered the de-facto standard? 

The Pig approach seems also promising using Elephant Bird Do anybody has pointers to more user documentation on this project? Or is browsing through the examples in GitHub my only source?

Thanks