drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Sub-select
Date Tue, 11 Nov 2014 01:56:25 GMT
Also, there are some limitations on using joins and in queries on many data
sources.

For instance, with HBase or MapR DB, the IN will result in a full table
scan even if the left hand side of the IN is the key.  Hakim is working on
a UDF to simulate the desired behavior until the HBase data source can be
extended with a magic join operator.



On Mon, Nov 10, 2014 at 6:11 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:

> The In subquery and Join Subquery should work on both 0.6 and 0.7.  For
> example, TPCH Q18 has an IN-subquery and runs fine on both version.
>
> SELECT ...
> .....
> where
>   o.o_orderkey in (
>     select
>       l_orderkey
>     from
>       cp.`tpch/lineitem.parquet`
>     group by
>       l_orderkey having
>         sum(l_quantity) > 300
>   )
> ...
>
> On the other hand, there is some known issue when IN-subquery is applied to
> complex data type column (See DRILL-1397).  Are you using complex data type
> column in your IN-SUBQ? If possible, can you post a failing query example (
> modify the query to remove any sensitive information)?  We would like to
> take a look at what the failing query look like.
>
>
>
>
>
>
>
> On Mon, Nov 10, 2014 at 3:24 PM, Hao Zhu <hzhu@maprtech.com> wrote:
>
> > Confirmed. The same on 0.6.0.
> >
> > On Mon, Nov 10, 2014 at 3:16 PM, Richard Shaw <rshaw@maprtech.com>
> wrote:
> >
> > > Please can you confirm the subquery doesn't work on 0.6.0?
> > >
> > > Thank you
> > >
> > > On 10 November 2014 23:12, Hao Zhu <hzhu@maprtech.com> wrote:
> > >
> > >> Both subquery should work, see below tests in 0.7:
> > >>
> > >> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv`;
> > >>> +------------+------------+
> > >>> |   EXPR$0   |   EXPR$1   |
> > >>> +------------+------------+
> > >>> | 1          | a          |
> > >>> | 2          | b          |
> > >>> | 3          | c          |
> > >>> +------------+------------+
> > >>> 3 rows selected (0.082 seconds)
> > >>> 0: jdbc:drill:zk=local> select columns[0],columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`;
> > >>> +------------+------------+
> > >>> |   EXPR$0   |   EXPR$1   |
> > >>> +------------+------------+
> > >>> | 2          | b          |
> > >>> | 3          | c          |
> > >>> +------------+------------+
> > >>> 2 rows selected (0.088 seconds)
> > >>> 0: jdbc:drill:zk=local> select columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` where
> > >>> . . . . . . . . . . . > columns[0] in (select columns[0] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv`);
> > >>> +------------+
> > >>> |   EXPR$0   |
> > >>> +------------+
> > >>> | b          |
> > >>> | c          |
> > >>> +------------+
> > >>> 2 rows selected (0.586 seconds)
> > >>> 0: jdbc:drill:zk=local>
> > >>> 0: jdbc:drill:zk=local> select a.columns[1] from
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub1.csv` as a,
> > >>> . . . . . . . . . . . >
> > >>> dfs.root.`/Users/hzu/Documents/sharefolder/test/sub2.csv` as b
> > >>> . . . . . . . . . . . > where a.columns[0]=b.columns[0];
> > >>> +------------+
> > >>> |   EXPR$0   |
> > >>> +------------+
> > >>> | b          |
> > >>> | c          |
> > >>> +------------+
> > >>> 2 rows selected (0.356 seconds)
> > >>
> > >>
> > >>
> > >> On Mon, Nov 10, 2014 at 3:02 PM, Richard Shaw <rshaw@maprtech.com>
> > wrote:
> > >>
> > >>> Hi, please can you confirm if the following type of sub-select query
> is
> > >>> possible or how else to go about it.
> > >>>
> > >>> Case 1:
> > >>>
> > >>> Select …….
> > >>> From fish_data
> > >>> Where  “fish_id”  in (select  “fish_id”  from ….. where ….)
> > >>>
> > >>> Case 2:
> > >>>
> > >>> Select  …
> > >>> From fish_data  fishd
> > >>> Join (select … from  where …..)  fishtime   on fishtime.fishid
=
> > >>> fishd.fishid
> > >>>
> > >>> Based on an existing query
> > >>>
> > >>> MS SQL Example:
> > >>>
> > >>> SELECT TOP 1
> > >>>             fishId,
> > >>>             datetime AS dataStart
> > >>> FROM
> > >>>             fish_data
> > >>> WHERE
> > >>>             fishId = '353234025714222'
> > >>>             AND DATEDIFF(
> > >>>                          dd,
> > >>>                          (SELECT MIN(cake.datetime) AS minDate FROM
> > >>> fish_data AS cake WHERE cake.fishId = '353234025714222'),
> > >>>                          datetime
> > >>>             ) > 5
> > >>> ORDER BY
> > >>>             datetime ASC
> > >>>
> > >>> Thank you
> > >>>
> > >>> Kind Regards,
> > >>> Richard
> > >>>
> > >>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message