drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From weijie tong <tongweijie...@gmail.com>
Subject Re: Questions about Drill's multi-thread model
Date Sat, 22 Jul 2017 06:30:46 GMT
 Sorry,according to some other reasons,this plugin is not available for
others to use.

  Yes, there's an additional throttling on the storage side. I don't
understand the throttling dependency relationship between the two systems
you described. I think our online drill system target is to satisfy as many
 as possible and responds quickly at the seconds level . As the storage
level has a concurrency limit for long running queries,these queries should
be checked for review laterally  and should be killed.

     I really appreciate the design to abstract out the drill 's throttling
mechanism for others to implement their own strategies. But I am not sure
whether this can help to solve this problem. I guess that your strategy is
to satisfy exsisting queries never mind whether they are good or bad
queries and keep the storage's service capacity by rejecting or queuing
incoming drill queries. If so,that is beyond our demand .

On Sat, 22 Jul 2017 at 1:22 AM Paul Rogers <progers@mapr.com> wrote:

> Thanks for the info! Very cool that you got Drill to work with Druid. Is
> this plugin available for others to use?
>
> Since your data source has a limit, you’ll want to apply throttling in
> Drill in the Foreman before queries launch onto the cluster. At present,
> Drill has the throttling mechanism I mentioned: ZK-based queues that
> throttle all queries.
>
> What you really need is something that we’ve not seen before: additional
> throttling at the data source level, so let’s talk about that a bit.
>
> First, you need some way to know the load on Druid. If everything is on a
> single node, that is not too hard. If distributed, then you need so way to
> learn the load on (or reservations for) Druid so that each Drillbit can
> throttle based on that load. With that, you can implement first-come-first
> served using ZK or some other mechanism.
>
> Then, this mechanism has to be bolted onto the Foreman. Today, the
> ZK-based queueing is in-line within the Foreman itself. But, as it turns
> out, we have a revision in the works that will abstract out the throttling
> code into an interface. That will make it far easier for you to add a
> custom throttling mechanism for Druid. If this is helpful, I can do a PR
> for the work thus far after 1.11 goes out.
>
> Finally, you have to scan the plan looking for uses of the Druid data
> source, and only add those to the Druid queue. Visitors exist which you can
> implement to obtain the required info.
>
> The result would be that Druid queries block waiting for capacity on Druid
> to become available, while all other queries run immediately.
>
> - Paul
>
> > On Jul 21, 2017, at 2:31 AM, weijie tong <tongweijie178@gmail.com>
> wrote:
> >
> > Thanks for all your reply @Jinfeng @Paul Rogers.
> >
> > Our storage plugin is Druid. The reason is that the blocked query
> exhausts
> > the storage plugin's concurrent threads. so other subsequent queries will
> > not be serviced immediately by the Druid plugin.
> >
> >
> >
> > On Thu, 20 Jul 2017 at 2:15 AM Jinfeng Ni <jni@apache.org> wrote:
> >
> >> What do you mean by "one query was blocked by the storage"? Are the
> other
> >> queries blocked in query planning time, or execution time?
> >>
> >> I recalled someone asked the problem related to long query planning.
> Turns
> >> out that if one enabled storage plugin is slow to access, it would
> affect
> >> every query, even if they do not need to access that slow storage
> plugin.
> >> In your case, if the other queries are blocked in planning time, it
> might
> >> be caused by the fact that storage plugin is slow due to the first
> running
> >> query.  See DRILL-5089.
> >>
> >> https://issues.apache.org/jira/browse/DRILL-5089
> >>
> >> On Wed, Jul 19, 2017 at 9:14 AM, Paul Rogers <progers@mapr.com> wrote:
> >>
> >>> Hi Weijie,
> >>>
> >>> There is nothing in Drill’s design that would account for this
> behavior:
> >>> each query runs a separate set of threads from any other query; there
> is
> >> no
> >>> synchronization among queries.
> >>>
> >>> Did you, perhaps, enable Drill’s ZK-based queueing feature? That would
> >>> cause later queries to block waiting for the completion of earlier
> ones.
> >>>
> >>> Otherwise, perhaps there is an issue with some particular piece of
> code.
> >>> What type of file is being read? In what environment?
> >>>
> >>> Thanks,
> >>>
> >>> - Paul
> >>>
> >>>> On Jul 19, 2017, at 7:20 AM, weijie tong <tongweijie178@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi there,
> >>>>  Our product environment has a situation that if one query was blocked
> >>> by
> >>>> the storage,then all other queries which come later would took long
> and
> >>>> long time to run even they really just need fewer time. At the time
> >> ,the
> >>>> cluster's load is not too high.
> >>>>   I know that every foreman will run in dedicated different  threads.
> >> I
> >>>> want to know that IO threads model,whether they are shared by
> different
> >>>> foremans' threads. How to explain last scenario's behavior?
> >>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message