impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Behm <alex.b...@cloudera.com>
Subject Re: How many threads impala start for handling partitioned join?
Date Fri, 27 Oct 2017 15:47:26 GMT
The way we intend multithreading to work is that a parameter MT_DOP will
control how many fragment instances per node are going to be run.
Today, MT_DOP works for simple queries that do not involve joins - but it
does not yet work for joins.

On Thu, Oct 26, 2017 at 11:33 PM, 俊杰陈 <cjjnjust@gmail.com> wrote:

> Thanks, Alex!
>
> My question maybe can impala start multiple fragment instances for a
> particular plan fragment on a single node,  for example, I have 5 fragment
> instances for a plan fragment say F01 on a 5 nodes cluster, is that
> possible to have 10 F01 instances on 5 nodes, 2 F01 instances per node?
>
> 2017-10-27 13:41 GMT+08:00 Alexander Behm <alex.behm@cloudera.com>:
>
> > The multithreading effort is still ongoing. Joins, in particular, are not
> > executed with multiple threads yet.
> >
> > Not sure if I completely followed your last two questions, please correct
> > me if I misunderstood.
> > The general idea of the multithreading effort is to start multiple
> fragment
> > instances per host. A fragment instance may contain an exchange node.
> >
> >
> > On Wed, Oct 25, 2017 at 7:22 PM, 俊杰陈 <cjjnjust@gmail.com> wrote:
> >
> > > Thanks for the reply.
> > >
> > > I saw IMPALA-3902 <https://issues.apache.org/jira/browse/IMPALA-3902>
> > > seems
> > > to add support for multithread execution.  It describes the goal is to
> > > support running multiple fragment instances on a single node, is that
> > means
> > > coordinator generate multiple instances for a plan fragment on a single
> > > node so that starts multiple exchange nodes to receive data and
> process?
> > Or
> > > it starts instances for different plan fragments for preparing
> > > the streaming?
> > >
> > > 2017-10-25 22:08 GMT+08:00 Jeszy <jeszyb@gmail.com>:
> > >
> > > > Hello JJ,
> > > >
> > > > No, currently Impala uses one thread to execute the join (without
> > > > regard for the amount of partitions that fit into memory).
> > > >
> > > > HTH
> > > >
> > > > On 25 October 2017 at 05:44, 俊杰陈 <cjjnjust@gmail.com> wrote:
> > > > > Hi
> > > > >
> > > > > When Impala does a partitioned join on a node, it split the build
> > input
> > > > > into partitions until a partition can fit into memory and consume
> the
> > > > probe
> > > > > input then do the join and output rows.
> > > > >
> > > > > My question is will impala schedule multiple tasks to do join if
> > > multiple
> > > > > partitions fit into memory, or iterate over partitions? And for one
> > > > > partition does it use multiple threads to do join?  Thanks in
> > advanced.
> > > > >
> > > > >
> > > > > JJ
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Best Regards
> > >
> >
>
>
>
> --
> Thanks & Best Regards
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message