Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3532510699 for ; Tue, 17 Mar 2015 22:15:42 +0000 (UTC) Received: (qmail 80962 invoked by uid 500); 17 Mar 2015 22:15:39 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 80890 invoked by uid 500); 17 Mar 2015 22:15:39 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 80877 invoked by uid 99); 17 Mar 2015 22:15:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Mar 2015 22:15:39 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FORGED_YAHOO_RCVD,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tjaybelt@yahoo.com designates 98.139.213.138 as permitted sender) Received: from [98.139.213.138] (HELO nm18-vm0.bullet.mail.bf1.yahoo.com) (98.139.213.138) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Mar 2015 22:15:11 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1426630509; bh=jy0Xdll3tRuPW5NztD747jUE8J6OMfYgzzvK05ahVmw=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=ogLpIrvO9K0BI1aD57g5bdH4ADtUqrJkPQ1My6N1+/jRLwEfs307B2bJZOBa82WvxiIQc+ROJmPWEhyj4duG/X0mckERrABw8p23zDTQ/1Lfp7bHylNF6tKCiQcJDkVtXMee/4pGXeVrW3RgZByvvI7I9T8DBtfJPpV4MW1RiF1C8W0gcysBO+XkSgmCtEEh1kihPoxA6/yIWfUXWCyDJ1t5wsxVxVmeEx7nbNzg+pnntbhaK8ZEa4YrOc/z16YR+ubO8wTt3CPEMfrulYh1RJRtvIo/apLVZ22xXHSReJBh2r4LoC2FYQ3i1Zb9RCqG7nA8hSMSBUr17JaXw8Qkog== Received: from [98.139.170.180] by nm18.bullet.mail.bf1.yahoo.com with NNFMP; 17 Mar 2015 22:15:09 -0000 Received: from [98.139.212.231] by tm23.bullet.mail.bf1.yahoo.com with NNFMP; 17 Mar 2015 22:15:09 -0000 Received: from [127.0.0.1] by omp1040.mail.bf1.yahoo.com with NNFMP; 17 Mar 2015 22:15:09 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 204604.83141.bm@omp1040.mail.bf1.yahoo.com X-YMail-OSG: 6nmKvB8VM1lyS7QDajI2xHAESLMUoxW0CWCGQb92WqgHvkxCPApE7Q7W_mS8IUQ VSlDquaeOmLHJ7uIeNCbIrAkZpT30Gw3ECGmoAPNkVKWlYafeQCvfrU3ZSzYzkiVMzs2PZ_DlafT vPgg1IvzFrcpy0u7P9ZlH77PYEHvIe_yjh2aTLuyWG.MFbQzK3gDi0onWpov9wRvIFyg.WXRd82E ActRtyqhw_C7SjMM30ugx5nTU_c.m8AUUOgtKWT4IH9LsqkbEqpcEDBnJpMnTD2mZmN5QvG4.qAz vF_F3kp6XjwoU5ZLma_UCOxvISamr9JrW.KBunpBmi1zFTG0UuuE7zgl2nByYvKVxMqJj4_VGJ6h .y49smXSRt9m9qXV7wsvZV.RTda4Y_Acff6ig79ncBZKvVyQUzgmFSHhZbUXki3MdInKIIsJ2uaK CYdvcUzONnkNIjXG5qnHbIX60sXMiP79iq8PGIJxUqTNwRJaVW5uEPnRtLgiHWCFxhSU1T39Q1lF BoFet Received: by 76.13.27.132; Tue, 17 Mar 2015 22:15:08 +0000 Date: Tue, 17 Mar 2015 22:15:08 +0000 (UTC) From: TJay Belt Reply-To: TJay Belt To: "user@hive.apache.org" Message-ID: <1573395686.121974.1426630508252.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: <404844947.1981781.1426629641637.JavaMail.yahoo@mail.yahoo.com> References: <404844947.1981781.1426629641637.JavaMail.yahoo@mail.yahoo.com> Subject: Fw: help with getting the summary of a list of numbers from a json document MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_121973_970627527.1426630508241" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_121973_970627527.1426630508241 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable =20 I have a json document that has several sub documents. I can query this doc= and pull out numbers from a repetitive subdocument. The result is a list w= ithin brackets. I want to sum these numbers, and cannot figure out how. Here is a sample document.=C2=A0 {=C2=A0 =C2=A0 "_id": "005f2c46-4a70-4fed-a8e9-a445011d2aba",=C2=A0 =C2=A0 = "Revision": 7,=C2=A0 =C2=A0 "ActivityCountedCollection": [=C2=A0 =C2=A0 =C2= =A0 =C2=A0 {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Lesson": "98d66ab9-1= ef4-4b61-a05d-857b3e07e0f8",=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Date= Time": "2015-03-02T14:11:05.5407801+00:00",=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 "ElapsedSeconds": 66.80226=C2=A0 =C2=A0 =C2=A0 =C2=A0 },=C2=A0 = =C2=A0 =C2=A0 =C2=A0 {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "Lesson": "= 30698aac-5a3d-4464-935c-16de4ba9db70", =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "DateTime": "2015-03-02T14:18:29.= 7132608+00:00", =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "ElapsedSeconds": 435.24593=C2=A0= =C2=A0 =C2=A0 =C2=A0 },=C2=A0 =C2=A0 =C2=A0 =C2=A0 {=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 "Lesson": "30698aac-5a3d-4464-935c-16de4ba9db70", =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "DateTime": "2015-03-02T14:21:38.= 4381108+00:00", =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "ElapsedSeconds": 182.47075=C2=A0= =C2=A0 =C2=A0 =C2=A0 }=C2=A0 =C2=A0 ]} I can use the following hive code from within HDInsight to pull out the val= ues.=C2=A0 DROP TABLE IF EXISTS HDStudents;CREATE EXTERNAL TABLE HDStudents(=C2=A0 jso= n_response STRING)STORED AS TEXTFILE LOCATION 'wasb://blob@tjaystorage.blob= .core.windows.net/HDStudentData'; DROP TABLE IF EXISTS HDStudent_Usage;CREATE TABLE HDStudent_Usage(=C2=A0 id= STRING,=C2=A0 sec STRING); FROM HDStudentsINSERT OVERWRITE TABLE HDStudent= _UsageSELECT=C2=A0 =C2=A0 CAST(get_json_object(json_response, '$.id') as ST= RING),=C2=A0 =C2=A0 get_json_object(json_response, '$.ActivityCountedCollec= tion.ElapsedSeconds') as STRING; and the result ends up being something like the following =C2=A0=C2=A0=C2=A0=C2=A0\N[66.80226,435.24593,182.47075] I dont know whats happening to my ID, but that is not important now. But th= e numbers from the above example come out as a list, within brackets. Its n= ot even formed as JSon anymore, so the previous tricks to pull them out of = their key value pair do not apply.=C2=A0Since it is reading it as a string,= with commas and numbers, i cannot perform aggregate functions on it.Online= , i get suggestions to create some maven compiled code function that is ref= erenced, and i get lost attempting this too.=C2=A0I thought about pushing t= hese results into yet another table, and doing some processing on it. But d= o not know how to extract the items from the list, or array, or whatever it= is in now.=C2=A0 Help? ------=_Part_121973_970627527.1426630508241 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable





I have a json document that has several sub documents. = I can query this doc and pull out numbers from a repetitive subdocument. Th= e result is a list within brackets. I want to sum these numbers, and cannot= figure out how.


Here is a sample = document. 

{
    "_id": "005f2c46-4a70-4fed-a8e9-a445011d2a= ba",
    "Revision": 7,
    "ActivityCountedCollection":= [
        {
=
            "= Lesson": "98d66ab9-1ef4-4b61-a05d-857b3e07e0f8",
            "DateTime": "2015-03= -02T14:11:05.5407801+00:00",
  &n= bsp;         "ElapsedSeconds": 66.80226
        },
        {
            "Lesson": "30698aac-5a3d= -4464-935c-16de4ba9db70",
            "DateT= ime": "2015-03-02T14:18:29.7132608+00:00",
        &nb= sp;   "ElapsedSeconds": 435.24593
        },
 =       {
    =         "Lesson": "30698aac-5a3d-4464-935c-16de4ba9db70= ",
            "DateTime": "2015-03-02T14:= 21:38.4381108+00:00",
            "ElapsedS= econds": 182.47075
     = ;   }
    ]
}



I c= an use the following hive code from within HDInsight to pull out the values= . 

DROP TABLE IF EXISTS HDStude= nts;
CREATE EXTERNAL TABLE HDStudents<= /div>
(
  json_response STRING
)
STORED AS TEXTFILE LOCATION 'wasb://blob@tjay= storage.blob.core.windows.net/HDStudentData';

DROP TABLE IF EXISTS HDStudent_Usage;
CREATE TABLE HDStudent_Usage
(
  id STRING,
  sec STRING
= );
=09
FROM = HDStudents
INSERT OVERWRITE TABLE HDSt= udent_Usage
SELECT
    CAST(get_json_object(json_response, '$.id')= as STRING),
    get_json_ob= ject(json_response, '$.ActivityCountedCollection.ElapsedSeconds') as STRING= ;

and the result ends up being somet= hing like the following

  =   \N[66.80226,435.24593,182.47075]

I dont know whats happening= to my ID, but that is not important now. But the numbers from the above ex= ample come out as a list, within brackets. Its not even formed as JSon anym= ore, so the previous tricks to pull them out of their key value pair do not= apply. 
Since it is reading it a= s a string, with commas and numbers, i cannot perform aggregate functions o= n it.
Online, i get suggestions to cre= ate some maven compiled code function that is referenced, and i get lost at= tempting this too. 
I thought abo= ut pushing these results into yet another table, and doing some processing = on it. But do not know how to extract the items from the list, or array, or= whatever it is in now. 


Help= ?



------=_Part_121973_970627527.1426630508241--