drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Makkar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5414) Issue with Querying Directories
Date Wed, 05 Apr 2017 15:16:41 GMT
Paul Makkar created DRILL-5414:
----------------------------------

             Summary: Issue with Querying Directories
                 Key: DRILL-5414
                 URL: https://issues.apache.org/jira/browse/DRILL-5414
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 1.10.0
         Environment: Kubernetes running Debian GNU/Linux 8 containers.
openjdk version "1.8.0_111".
AWS.
Using s3 buckets
            Reporter: Paul Makkar


Hi

*Thanks for apache drill - it's pretty awesome :)

I'm hoping to exploit drill directory querying and have structured my data archive in s3 to
test this. However, I've got an issue using directory querying.

My directory structure in s3 is like:
s3/devices_by_id/device_id/2016/11/12/<filename>.json.gz

>From the documentation I figured the following queries were equivalent:

select count(*) from `s3`.`/deviceid/xyz/2016/11/` ;
+---------+
| EXPR$0  |
+---------+
| 286049  |
+---------+
1 row selected (10.351 seconds)

select count(*) from `s3`.`/deviceid/` where dir0='xyz' and dir1='2016' and dir2='11'; But
this latter query just hangs. There is no profile in the UI. I cntrl-c and get :

+--+
|  |
+--+
+--+
No rows selected (1481.727 seconds)

If I try to run an explain plan, that also hangs.

There are a total of 13283 compressed json files in the 2016/11 s3 bucket. 

The log doesn't show much information.

If anyone can help with this please? I can provide more information as required. Hopefully
this is not user error.






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message