drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-6061) Feature Request: Global Query List showing queries from all Drill foreman nodes
Date Tue, 16 Jan 2018 11:09:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327012#comment-16327012
] 

Hari Sekhon edited comment on DRILL-6061 at 1/16/18 11:08 AM:
--------------------------------------------------------------

Yes this is what we found after raising this, we will point to MapR-FS /apps/drill/pstore
to follow Hadoop layout best practices convention and test.

I think this should be documented a bit better / easier to find, perhaps in FAQ or a section
stating something like "Global Query List - how to see the queries on the cluster from any
Drill node" in the Apache Drill documentation. There is a MapR community connection to response
to this as well:

[https://community.mapr.com/thread/21498-what-are-best-practices-for-managing-drill-query-profiles]

I recommend changing the Apache Drill documentation at:

{{[https://drill.apache.org/docs/persistent-configuration-storage/]}}

{{<directory to store pstore data>}} to a standardized best practice location of {{/apps/drill/pstore}}
to fall in line with other apps on Hadoop clusters.

It's also worth documenting the load balancing algorithm used for load balancing across Drill
nodes when acquiring a drillbit via zookeeper quorum referral (random, round robin, least
connection etc).


was (Author: harisekhon):
Yes this is what we found after raising this, we will point to MapR-FS /apps/drill/pstore
to follow Hadoop layout best practices convention and test.

I think this should be documented a bit better / easier to find, perhaps in FAQ or a section
stating something like "Global Query List - how to see the queries on the cluster from any
Drill node" in the Apache Drill documentation. There is a MapR community connection to response
to this as well:

https://community.mapr.com/thread/21498-what-are-best-practices-for-managing-drill-query-profiles

I recommend changing the Apache Drill documentation {{<directory to store pstore data>}}
with a single best practice path of {{/apps/drill/pstore}} to standardize this and fall
in line with other apps on Hadoop clusters.

It's also worth documenting the load balancing algorithm used for load balancing across Drill
nodes when acquiring a drillbit via zookeeper quorum referral (random, round robin, least
connection etc).

> Feature Request: Global Query List showing queries from all Drill foreman nodes
> -------------------------------------------------------------------------------
>
>                 Key: DRILL-6061
>                 URL: https://issues.apache.org/jira/browse/DRILL-6061
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components:  Server, Documentation, Metadata, Query Planning &amp; Optimization,
Tools, Build &amp; Test, Web Server
>    Affects Versions: 1.11.0
>         Environment: MapR 5.2
>            Reporter: Hari Sekhon
>            Priority: Major
>
> Feature Request to add a Global Query List to show all queries executed across all Drill
nodes in a cluster for better management and auditing.
> Right now there doesn't appear to be a way to see all queries across all nodes in a Drill
cluster. The Web UI on any given Drill node only shows the queries coordinated by that local
node if acting as the foreman for the query, so if using ZooKeeper or a Load Balancer to distribute
queries via different Drill nodes then the query list will be spread across lots of different
nodes with no global timeline of queries.
> This seems to leave a bit of a gap in auditing functionality, with the only other option
that I can think of being immediately available is to limit all query submissions via a single
foreman node so the query list is complete on that node - although that doesn't seem like
a great idea in terms of load distribution of query planning, coordination and final aggregation
steps. I've made load balancing configurations for Apache Drill and similar technologies that
could be used for that purpose with failover support to maintain high availability at https://github.com/HariSekhon/nagios-plugins/tree/master/haproxy)
but would still prefer if Drill was designed to store the global list of queries submitted
in a centralized place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message