drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Barclay (Drill) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3722) LIMIT 1 query on top of a dir with 50K files takes ~150 seconds
Date Fri, 28 Aug 2015 22:44:46 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720711#comment-14720711
] 

Daniel Barclay (Drill) commented on DRILL-3722:
-----------------------------------------------

Flink handles cases like that (LIMIT on many-file queries) by processing the results from
reading some files before reading starts on all files. That means that when some fragment
determines that it has enough data (per LIMIT), many planned file reads can be abandoned before
they even start.

Would that approach work for Drill?

> LIMIT 1 query on top of a dir with 50K files takes ~150 seconds
> ---------------------------------------------------------------
>
>                 Key: DRILL-3722
>                 URL: https://issues.apache.org/jira/browse/DRILL-3722
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Assignee: Jinfeng Ni
>
> git.commit.id.abbrev=445790f
> I ran the below query on top of TPCH SF100 lineitem table with 50K files. For the nature
of the query, it looks like drill is very slow in handling it.
> {code}
> select * from lineitem limit 1;
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+--------------+---------------+----------------+-----------------+--------------+--------------+
> | L_ORDERKEY  | L_PARTKEY  | L_SUPPKEY  | L_LINENUMBER  | L_QUANTITY  | L_EXTENDEDPRICE
 | L_DISCOUNT  | L_TAX  | L_RETURNFLAG  | L_LINESTATUS  |  L_SHIPDATE  | L_COMMITDATE  | L_RECEIPTDATE
 | L_SHIPINSTRUCT  |  L_SHIPMODE  |  L_COMMENT   |
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+--------------+---------------+----------------+-----------------+--------------+--------------+
> | 456884480   | 19781678   | 781679     | 1             | 21.0        | 36932.49    
    | 0.1         | 0.03   | [B@44f54509   | [B@4287753d   | [B@4b2219ea  | [B@2bd3782f  
| [B@48776c23    | [B@185c9300     | [B@65b6f17e  | [B@4da8bb5d  |
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+--------------+---------------+----------------+-----------------+--------------+--------------+
> 1 row selected (158.976 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message