impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <tarmstr...@cloudera.com>
Subject Re: Question about the multi-thread scan node model
Date Wed, 30 Aug 2017 23:16:53 GMT
Hi,
  The new scanner model is part of the multithreading work to support
running multiple instances of each fragment on each Impala daemon. The idea
there is that parallelisation is done at the fragment level so that all
execution including aggregations, sorts, joins is parallelised - not just
scans. This is enabled by setting mt_dop > 0. Currently it doesn't work for
plans including joins and HDFS inserts.

We find that a lot of queries are compute bound, particularly by
aggregations and joins. In those cases we get big speedups from the newer
multithreading model. E.g. "compute stats" is a lot faster.

On Wed, Aug 30, 2017 at 3:50 PM, 黄权隆 <huangquanlong@gmail.com> wrote:

> Hi all,
>
>
> I’m working on applying our orc-support patch into the latest code bases (
> IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>). Since
> our
> patch is based on cdh-5.7.3-release which was released one year ago,
> there’re lots of work to merge it.
>
>
> One of the biggest changes from cdh-5.7.3-release I notice is the new scan
> node & scanner model introduced in IMPALA-3902
> <https://issues.apache.org/jira/browse/IMPALA-3902>. I think it’s inspired
> by the investigating task in IMPALA-2849
> <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot find any
> performance report in this issue. Could you share some report about this
> multi-thread refactor?
>
>
> I’m wondering how much this can improve the performance, since the old
> single thread scan node & multi-thread scanners model has supplied
> concurrent IO for reading, and most of the queries in OLAP are IO bound.
>
>
> Thanks,
>
> Quanlong
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message