drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Ollala <soll...@maprtech.com>
Subject Re: 6 to 7 min delay in closing query when pulling over multiple json files using drill-0.6.0.28642.r2-1.noarch
Date Wed, 26 Nov 2014 04:59:25 GMT
You might be hitting Drill-1681

On Tue, Nov 25, 2014 at 7:02 PM, Jim Bates <jbates@maprtech.com> wrote:

> Didn't get a hit on this so I'm sending it for round 2...
>
> When executing a query to a specific file and limiting to 1 row returned
> the query returns in under a second. When keeping the same limit but
> increasing the scope to several directories of JSON files it returns the
> single row quickly but can take up to 7 to 10 min to "finish". That delay
> forces one to configure a timeout of 600 to 1200 sec in the ODBC connector
> or the query will fail.
>
> Any workarounds for this?
>
> Query to a single file:
> select * FROM (select `dir0` as `city`, to_timestamp(
> `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> flatten(`stationBeanList`) as `stations` FROM
>  `data`.`all_bikes`.`../bikes/chicago/bikestations/1416875401.json` limit
> 1) a limit 1;
> +------------+---------------+------------+
> |    city    | executionTime |  stations  |
> +------------+---------------+------------+
> | null       | 2014-11-24 18:29:01.0 | {"id":5,"stationName":"State St &
> Harrison
> St","availableDocks":12,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> Service","statusKey":1,"availableBikes":7,"stAddress1":"State St & Harrison
> St","stAddress2":"","city":"","postalCode":"","location":"620 S. State
> St.","altitude":"","testStation":false,"landMark":"030"} |
> +------------+---------------+------------+
> 1 row selected (0.542 seconds)
>
> When executing over a larger scope it returns the first row in 3 sec but
> does not close the query for another 6 or 7 minuets:
> select * FROM (select `dir0` as `city`, to_timestamp(
> `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> flatten(`stationBeanList`) as `stations` FROM
>  `data`.`all_bikes`.`../bikes` limit 1) a limit 1;
> +------------+---------------+------------+
> |    city    | executionTime |  stations  |
> +------------+---------------+------------+
> | chicago    | 2014-11-17 23:29:01.0 | {"id":5,"stationName":"State St &
> Harrison
> St","availableDocks":8,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> Service","statusKey":1,"availableBikes":11,"stAddress1":"State St &
> Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S.
> State St.","altitude":"","testStation":false,"landMark":"030"} | * <--- At
> this point in 3 sec*
> +------------+---------------+------------+
> 1 row selected (683.15 seconds)
>
>
> On Mon, Nov 24, 2014 at 10:00 PM, Jim Bates <jbates@maprtech.com> wrote:
>
> > When executing a query to a specific file and limiting to 1 the query
> > returns in under a second:
> > select * FROM (select `dir0` as `city`, to_timestamp(
> > `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> > flatten(`stationBeanList`) as `stations` FROM
> >  `data`.`all_bikes`.`../bikes/chicago/bikestations/1416875401.json` limit
> > 1) a limit 1;
> > +------------+---------------+------------+
> > |    city    | executionTime |  stations  |
> > +------------+---------------+------------+
> > | null       | 2014-11-24 18:29:01.0 | {"id":5,"stationName":"State St &
> > Harrison
> St","availableDocks":12,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> > Service","statusKey":1,"availableBikes":7,"stAddress1":"State St &
> Harrison
> > St","stAddress2":"","city":"","postalCode":"","location":"620 S. State
> > St.","altitude":"","testStation":false,"landMark":"030"} |
> > +------------+---------------+------------+
> > 1 row selected (0.567 seconds)
> >
> > When executing over a larger scope it returns the first row in 3 sec but
> > does not close the query for another 6 or 7 minuets:
> > select * FROM (select `dir0` as `city`, to_timestamp(
> > `executionTime`,'YYYY-MM-dd hh:mm:ss a') as `executionTime`,
> > flatten(`stationBeanList`) as `stations` FROM
> >  `data`.`all_bikes`.`../bikes` limit 1) a limit 1;
> > +------------+---------------+------------+
> > |    city    | executionTime |  stations  |
> > +------------+---------------+------------+
> > | chicago    | 2014-11-17 23:29:01.0 | {"id":5,"stationName":"State St &
> > Harrison
> St","availableDocks":8,"totalDocks":19,"latitude":41.8739580629,"longitude":-87.6277394859,"statusValue":"In
> > Service","statusKey":1,"availableBikes":11,"stAddress1":"State St &
> > Harrison St","stAddress2":"","city":"","postalCode":"","location":"620 S.
> > State St.","altitude":"","testStation":false,"landMark":"030"} | * <---
> > At this point in 3 sec*
> > +------------+---------------+------------+
> > 1 row selected (496.05 seconds)
> >
> > Any reason that might be?
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message