Return-Path: X-Original-To: apmail-incubator-drill-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-drill-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A0261064D for ; Wed, 26 Nov 2014 03:02:57 +0000 (UTC) Received: (qmail 42769 invoked by uid 500); 26 Nov 2014 03:02:57 -0000 Delivered-To: apmail-incubator-drill-user-archive@incubator.apache.org Received: (qmail 42699 invoked by uid 500); 26 Nov 2014 03:02:57 -0000 Mailing-List: contact drill-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: drill-user@incubator.apache.org Delivered-To: mailing list drill-user@incubator.apache.org Received: (qmail 42687 invoked by uid 99); 26 Nov 2014 03:02:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Nov 2014 03:02:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_QUOTING X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbates@maprtech.com designates 209.85.160.179 as permitted sender) Received: from [209.85.160.179] (HELO mail-yk0-f179.google.com) (209.85.160.179) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Nov 2014 03:02:30 +0000 Received: by mail-yk0-f179.google.com with SMTP id 19so886939ykq.24 for ; Tue, 25 Nov 2014 19:02:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=JybQ0/OweNc3n+Tj+9bJhloX69W24PPEZZ4hhXKLMFA=; b=FsPkwjo9MdA62ykWMP7dHTia9Yf6TMIPGgqVSazlCB2SBFQjJd3864C6e94ZtXcS5G Po5b+YUjyPzdpCMiDwX1di0cfGcKDxZ1T9Q+aP1ilkOCXD9jR6jB127rWqGN6iA0tkGI 3an9B4Eb1bM11HMmi6E+fdKdtWxFuGHOmpd5/FkuikxVy4TOgEFNczZvpjz8CzIjcVqw bjo0DeqySVQXo6iCo56VwfdXSmvbDbWcmUmcCRidF115dMkVj//RifWL2NqNBVsSdH1I 56w7peYuhcoSBUaJ9O/j2KzQfXPv/fx51phxCqN32ABhJiENcB7F6dWXvKUsR/bILvMi o4gg== X-Gm-Message-State: ALoCoQkeq/OT2Ekf+Yt323l0epuiqAR21lALhUHieIzmxYJ0Qzh7hPUWGrW2kasjLSYMa9x4dbXS X-Received: by 10.170.191.137 with SMTP id i131mr31735652yke.100.1416970949026; Tue, 25 Nov 2014 19:02:29 -0800 (PST) MIME-Version: 1.0 Received: by 10.170.117.202 with HTTP; Tue, 25 Nov 2014 19:02:08 -0800 (PST) In-Reply-To: References: From: Jim Bates Date: Tue, 25 Nov 2014 21:02:08 -0600 Message-ID: Subject: Re: 6 to 7 min delay in closing query when pulling over multiple json files using drill-0.6.0.28642.r2-1.noarch To: Drill-User Content-Type: multipart/alternative; boundary=001a11c1148012ff710508ba42a1 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1148012ff710508ba42a1 Content-Type: text/plain; charset=UTF-8 Didn't get a hit on this so I'm sending it for round 2... When executing a query to a specific file and limiting to 1 row returned the query returns in under a second. When keeping the same limit but increasing the scope to several directories of JSON files it returns the single row quickly but can take up to 7 to 10 min to "finish". That delay forces one to configure a timeout of 600 to 1200 sec in the ODBC connector or the query will fail. Any workarounds for this? Query to a single file: select * FROM (select `dir0` as `city`, to_timestamp( `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`, flatten(`stationBeanList`) as `stations` FROM `data`.`all_bikes`.`../bikes/chicago/bikestations/1416875401.json` limit 1) a limit 1; +------------+---------------+------------+ | city | executionTime | stations | +------------+---------------+------------+ | null | 2014-11-24 18:29:01.0 | {"id":5,"stationName":"State St & Harrison St","availableDocks":12,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In Service","statusKey":1,"availableBikes":7,"stAddress1":"State St & Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S. State St.","altitude":"","testStation":false,"landMark":"030"} | +------------+---------------+------------+ 1 row selected (0.542 seconds) When executing over a larger scope it returns the first row in 3 sec but does not close the query for another 6 or 7 minuets: select * FROM (select `dir0` as `city`, to_timestamp( `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`, flatten(`stationBeanList`) as `stations` FROM `data`.`all_bikes`.`../bikes` limit 1) a limit 1; +------------+---------------+------------+ | city | executionTime | stations | +------------+---------------+------------+ | chicago | 2014-11-17 23:29:01.0 | {"id":5,"stationName":"State St & Harrison St","availableDocks":8,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In Service","statusKey":1,"availableBikes":11,"stAddress1":"State St & Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S. State St.","altitude":"","testStation":false,"landMark":"030"} | * <--- At this point in 3 sec* +------------+---------------+------------+ 1 row selected (683.15 seconds) On Mon, Nov 24, 2014 at 10:00 PM, Jim Bates wrote: > When executing a query to a specific file and limiting to 1 the query > returns in under a second: > select * FROM (select `dir0` as `city`, to_timestamp( > `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`, > flatten(`stationBeanList`) as `stations` FROM > `data`.`all_bikes`.`../bikes/chicago/bikestations/1416875401.json` limit > 1) a limit 1; > +------------+---------------+------------+ > | city | executionTime | stations | > +------------+---------------+------------+ > | null | 2014-11-24 18:29:01.0 | {"id":5,"stationName":"State St & > Harrison St","availableDocks":12,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In > Service","statusKey":1,"availableBikes":7,"stAddress1":"State St & Harrison > St","stAddress2":"","city":"","postalCode":"","location":"620 S. State > St.","altitude":"","testStation":false,"landMark":"030"} | > +------------+---------------+------------+ > 1 row selected (0.567 seconds) > > When executing over a larger scope it returns the first row in 3 sec but > does not close the query for another 6 or 7 minuets: > select * FROM (select `dir0` as `city`, to_timestamp( > `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`, > flatten(`stationBeanList`) as `stations` FROM > `data`.`all_bikes`.`../bikes` limit 1) a limit 1; > +------------+---------------+------------+ > | city | executionTime | stations | > +------------+---------------+------------+ > | chicago | 2014-11-17 23:29:01.0 | {"id":5,"stationName":"State St & > Harrison St","availableDocks":8,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In > Service","statusKey":1,"availableBikes":11,"stAddress1":"State St & > Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S. > State St.","altitude":"","testStation":false,"landMark":"030"} | * <--- > At this point in 3 sec* > +------------+---------------+------------+ > 1 row selected (496.05 seconds) > > Any reason that might be? > > --001a11c1148012ff710508ba42a1--