impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huangquanlong@gmail.com"<huangquanl...@gmail.com>
Subject Re: Question about the multi-thread scan node model
Date Thu, 31 Aug 2017 07:34:03 GMT
Yeah, "compute stats" is really cpu bound. That sounds great!

I noticed that one of the sub tasks of multithreading work is labeled with "ramp up": https://issues.apache.org/jira/browse/IMPALA-5802
Is this on progress? If not, could you reassign it to me to familiar with the latest framework?

Thanks,
Quanlong

On 2017-08-31 07:16, Tim Armstrong <tarmstrong@cloudera.com> wrote: 
> Hi,
>   The new scanner model is part of the multithreading work to support
> running multiple instances of each fragment on each Impala daemon. The idea
> there is that parallelisation is done at the fragment level so that all
> execution including aggregations, sorts, joins is parallelised - not just
> scans. This is enabled by setting mt_dop > 0. Currently it doesn't work for
> plans including joins and HDFS inserts.
> 
> We find that a lot of queries are compute bound, particularly by
> aggregations and joins. In those cases we get big speedups from the newer
> multithreading model. E.g. "compute stats" is a lot faster.
> 
> On Wed, Aug 30, 2017 at 3:50 PM, 黄权隆 <huangquanlong@gmail.com> wrote:
> 
> > Hi all,
> >
> >
> > I’m working on applying our orc-support patch into the latest code bases (
> > IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>). Since
> > our
> > patch is based on cdh-5.7.3-release which was released one year ago,
> > there’re lots of work to merge it.
> >
> >
> > One of the biggest changes from cdh-5.7.3-release I notice is the new scan
> > node & scanner model introduced in IMPALA-3902
> > <https://issues.apache.org/jira/browse/IMPALA-3902>. I think it’s inspired
> > by the investigating task in IMPALA-2849
> > <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot find any
> > performance report in this issue. Could you share some report about this
> > multi-thread refactor?
> >
> >
> > I’m wondering how much this can improve the performance, since the old
> > single thread scan node & multi-thread scanners model has supplied
> > concurrent IO for reading, and most of the queries in OLAP are IO bound.
> >
> >
> > Thanks,
> >
> > Quanlong
> >
> 

Mime
View raw message