hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TJay Belt <tjayb...@yahoo.com>
Subject Fw: help with getting the summary of a list of numbers from a json document
Date Tue, 17 Mar 2015 22:15:08 GMT




 
I have a json document that has several sub documents. I can query this doc and pull out numbers
from a repetitive subdocument. The result is a list within brackets. I want to sum these numbers,
and cannot figure out how.

Here is a sample document. 
{    "_id": "005f2c46-4a70-4fed-a8e9-a445011d2aba",    "Revision": 7,    "ActivityCountedCollection":
[        {            "Lesson": "98d66ab9-1ef4-4b61-a05d-857b3e07e0f8",     
      "DateTime": "2015-03-02T14:11:05.5407801+00:00",            "ElapsedSeconds":
66.80226        },        {            "Lesson": "30698aac-5a3d-4464-935c-16de4ba9db70",
            "DateTime": "2015-03-02T14:18:29.7132608+00:00",
            "ElapsedSeconds": 435.24593        },        {           
"Lesson": "30698aac-5a3d-4464-935c-16de4ba9db70",
            "DateTime": "2015-03-02T14:21:38.4381108+00:00",
            "ElapsedSeconds": 182.47075        }    ]}


I can use the following hive code from within HDInsight to pull out the values. 
DROP TABLE IF EXISTS HDStudents;CREATE EXTERNAL TABLE HDStudents(  json_response STRING)STORED
AS TEXTFILE LOCATION 'wasb://blob@tjaystorage.blob.core.windows.net/HDStudentData';
DROP TABLE IF EXISTS HDStudent_Usage;CREATE TABLE HDStudent_Usage(  id STRING,  sec STRING);
FROM HDStudentsINSERT OVERWRITE TABLE HDStudent_UsageSELECT    CAST(get_json_object(json_response,
'$.id') as STRING),    get_json_object(json_response, '$.ActivityCountedCollection.ElapsedSeconds')
as STRING;
and the result ends up being something like the following
    \N[66.80226,435.24593,182.47075]

I dont know whats happening to my ID, but that is not important now. But the numbers from
the above example come out as a list, within brackets. Its not even formed as JSon anymore,
so the previous tricks to pull them out of their key value pair do not apply. Since it is
reading it as a string, with commas and numbers, i cannot perform aggregate functions on it.Online,
i get suggestions to create some maven compiled code function that is referenced, and i get
lost attempting this too. I thought about pushing these results into yet another table, and
doing some processing on it. But do not know how to extract the items from the list, or array,
or whatever it is in now. 

Help?



   
Mime
View raw message