Return-Path: X-Original-To: apmail-incubator-drill-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-drill-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A57617854 for ; Tue, 11 Nov 2014 22:14:22 +0000 (UTC) Received: (qmail 29704 invoked by uid 500); 11 Nov 2014 22:14:21 -0000 Delivered-To: apmail-incubator-drill-user-archive@incubator.apache.org Received: (qmail 29647 invoked by uid 500); 11 Nov 2014 22:14:21 -0000 Mailing-List: contact drill-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: drill-user@incubator.apache.org Delivered-To: mailing list drill-user@incubator.apache.org Received: (qmail 29636 invoked by uid 99); 11 Nov 2014 22:14:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Nov 2014 22:14:21 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hzhu@maprtech.com designates 74.125.82.47 as permitted sender) Received: from [74.125.82.47] (HELO mail-wg0-f47.google.com) (74.125.82.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Nov 2014 22:14:17 +0000 Received: by mail-wg0-f47.google.com with SMTP id a1so12700279wgh.20 for ; Tue, 11 Nov 2014 14:13:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=jO/1AsDgezviiHBd5M6PTaDXuySc6DZ+U22GUwnHP+Q=; b=j83rFVuAK/+QkkDxlTYXKEWDOzv+f1wcVXVIPqmBeIBwdBLdom3/cVIXdFoA5AWquG hGp2InsPKlGGOwk6G8s27oLijRN9MvM3hgY/A8b0KjtIkjLAL34gY7lnqhjhf/2/Abth HN+3eMgX57LngJtjKTAbOhSxUJake+6u3hesVDVtR7iWOmd6WgszSG5fXlt5ZQ51VVQk osEcY4cg2OEmH8xXj0+6LWlFMT1raQcB7rw0k6MwmbggT+PXeb84kzm0jPf7QC7Lh4NI 09lMr5wCxD35LyIE1V7hZdZ3pM7Yx/izrB3LxAQe/8bu9R+t+76p20csQsQwV9lCXGG4 +FFQ== X-Gm-Message-State: ALoCoQmPr7bKtDt/tlDqxO2MzDMO64MGXgHQ5CoiXyly5e7/jNnNf8Ov7irbxnd6FMrGKz6GV82u MIME-Version: 1.0 X-Received: by 10.180.104.234 with SMTP id gh10mr14325450wib.3.1415743991021; Tue, 11 Nov 2014 14:13:11 -0800 (PST) Received: by 10.27.93.147 with HTTP; Tue, 11 Nov 2014 14:13:10 -0800 (PST) In-Reply-To: References: Date: Tue, 11 Nov 2014 14:13:10 -0800 Message-ID: Subject: Re: Doubt on Querying JSON From: Hao Zhu To: drill-user@incubator.apache.org Content-Type: multipart/alternative; boundary=f46d04428f4aad979205079c95d5 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04428f4aad979205079c95d5 Content-Type: text/plain; charset=UTF-8 Hi Jim, I can successfully reproduce the issue you mentioned. 1. Put 2 json files with the same content -- 1.json and 2.json. $ cat 1.json > { > "timestamp":1415688106710, > "status":"OK", > "total":17, > "data":[ > { "volumename":"a", "actualreplication":[0,0,0], > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }, > { "volumename":"b", "actualreplication":[0,0,0], > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }, > { "volumename":"c", "actualreplication":[0,0,0], > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 } > ] > } 2. Then this SQL hung: > > select `timestamp` as newtime,flatten(data) as newdata from > dfs.tmp.`*.json`; > +------------+------------+ > | newtime | newdata | > +------------+------------+ > | 1415688106710 | > {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0} > | > | 1415688106710 | > {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0} > | > | 1415688106710 | > {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0} > | > It should produce 6 rows instead. Thanks, Hao On Tue, Nov 11, 2014 at 12:21 PM, Jim Bates wrote: > I had trouble with flatten extending past one wile. If you include more > than 1 json file in your data set it never seams to finish the command. In > my case it returned in 20 sec with one file but when the data set had two > files it never finiahed after 5 min so I gave up till a later release comes > along. > On Nov 11, 2014 11:57 AM, "Hao Zhu" wrote: > > > Hi Mufeed, > > > > I tested on latest 0.7 build and flatten can work in your case. > > > > with sub as > > > (select `timestamp` as newtime,flatten(data) as newdata from > > > dfs.tmp.`1.json`) > > > select sub.newtime, sub.newdata.volumename, > > > > > > sub.newdata.actualreplication,sub.newdata.InodesExceededAlarm,sub.newdata.ContainersNonLocalAlarm > > > from sub; > > > > > > +------------+------------+------------+------------+------------+ > > > | newtime | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | > > > +------------+------------+------------+------------+------------+ > > > | 1415688106710 | a | [0,0,0] | 0 | 0 | > > > | 1415688106710 | b | [0,0,0] | 0 | 0 | > > > | 1415688106710 | c | [0,0,0] | 0 | 0 | > > > +------------+------------+------------+------------+------------+ > > > 3 rows selected (0.158 seconds) > > > > Regarding your errors: > > > > 1. select timestamp from `1.json`; > > > > Since "timestamp" is a reserved word in Drill, so it should be changed > to: > > > > select `timestamp` from `1.json`; > > > > > > 2. select timestamp[0] from `1.json`; > > > > Since "timestamp" is not an array, so we can not use "[0]" to specify the > > 1st element. > > > > 3. select status[0] from `1.json`; > > > > The same as 2, "status" is not an array. > > > > Thanks, > > > > Hao > > > > On Tue, Nov 11, 2014 at 7:22 AM, Jacques Nadeau > > wrote: > > > > > We're indeed seeing some issues with the current master (development) > > > branch and flatten functionality. There a number of open bugs right > now > > > against it. Can you review the open bugs and see if they describe the > > > problem you are seeing? > > > > > > thanks > > > > > > On Tue, Nov 11, 2014 at 5:35 AM, mufy wrote: > > > > > > > I have the following. > > > > > > > > { > > > > "timestamp":1415688106710, > > > > "status":"OK", > > > > "total":17, > > > > "data":[ > > > > { "volumename":"a", "actualreplication":[0,0,0], > > > > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }, > > > > { "volumename":"b", "actualreplication":[0,0,0], > > > > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }, > > > > { "volumename":"c", "actualreplication":[0,0,0], > > > > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 } > > > > ] > > > > } > > > > > > > > But query results are "partial". > > > > > > > > 0: jdbc:drill:zk=n67:5181> select data[0] from `1.json`; > > > > +------------+ > > > > | EXPR$0 | > > > > +------------+ > > > > | > > > > > > > > > > {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0} > > > > | > > > > +------------+ > > > > 1 row selected (0.139 seconds) > > > > 0: jdbc:drill:zk=n67:5181> select data[1] from `1.json`; > > > > +------------+ > > > > | EXPR$0 | > > > > +------------+ > > > > | > > > > > > > > > > {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0} > > > > | > > > > +------------+ > > > > 1 row selected (0.125 seconds) > > > > > > > > > > > > > > > > 0: jdbc:drill:zk=n67:5181> select * from `1.json`; > > > > +------------+------------+------------+------------+ > > > > | timestamp | status | total | data | > > > > +------------+------------+------------+------------+ > > > > | 1415688106710 | OK | 17 | > > > > > > > > > > > > > > [{"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0},{"volumename":"b","actualreplication":[0,0,0] > > > > | > > > > +------------+------------+------------+------------+ > > > > 1 row selected (0.22 seconds) > > > > > > > > > > > > > > > > 0: jdbc:drill:zk=n67:5181> select timestamp from `1.json`; > > > > Query failed: Failure while parsing sql. > > > > > > > > Error: exception while executing query: Failure while executing > query. > > > > (state=,code=0) > > > > 0: jdbc:drill:zk=n67:5181> select timestamp[0] from `1.json`; > > > > Query failed: Failure while parsing sql. > > > > > > > > Error: exception while executing query: Failure while executing > query. > > > > (state=,code=0) > > > > 0: jdbc:drill:zk=n67:5181> select status[0] from `1.json`; > > > > +------------+ > > > > | EXPR$0 | > > > > +------------+ > > > > Query failed: Failure while running fragment. > > > > > > > > java.lang.RuntimeException: java.sql.SQLException: Failure while > > > > executing query. > > > > at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) > > > > at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) > > > > at sqlline.SqlLine.print(SqlLine.java:1809) > > > > at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) > > > > at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) > > > > at sqlline.SqlLine.dispatch(SqlLine.java:889) > > > > at sqlline.SqlLine.begin(SqlLine.java:763) > > > > at sqlline.SqlLine.start(SqlLine.java:498) > > > > at sqlline.SqlLine.main(SqlLine.java:460) > > > > > > > > > > --f46d04428f4aad979205079c95d5--