impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <tarmstr...@cloudera.com>
Subject Re: Question about the multi-thread scan node model
Date Thu, 31 Aug 2017 16:53:47 GMT
I spoke to Alex Behm off-list about that JIRA a while ago. I don't think
it's a true ramp-up task. The code change is easy but I think we would want
to do performance validation and testing to make sure that the new
multithreaded scanners have similar performance and stability before making
them the default.

On Thu, Aug 31, 2017 at 12:34 AM, huangquanlong@gmail.com <
huangquanlong@gmail.com> wrote:

> Yeah, "compute stats" is really cpu bound. That sounds great!
>
> I noticed that one of the sub tasks of multithreading work is labeled with
> "ramp up": https://issues.apache.org/jira/browse/IMPALA-5802
> Is this on progress? If not, could you reassign it to me to familiar with
> the latest framework?
>
> Thanks,
> Quanlong
>
> On 2017-08-31 07:16, Tim Armstrong <tarmstrong@cloudera.com> wrote:
> > Hi,
> >   The new scanner model is part of the multithreading work to support
> > running multiple instances of each fragment on each Impala daemon. The
> idea
> > there is that parallelisation is done at the fragment level so that all
> > execution including aggregations, sorts, joins is parallelised - not just
> > scans. This is enabled by setting mt_dop > 0. Currently it doesn't work
> for
> > plans including joins and HDFS inserts.
> >
> > We find that a lot of queries are compute bound, particularly by
> > aggregations and joins. In those cases we get big speedups from the newer
> > multithreading model. E.g. "compute stats" is a lot faster.
> >
> > On Wed, Aug 30, 2017 at 3:50 PM, 黄权隆 <huangquanlong@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > >
> > > I’m working on applying our orc-support patch into the latest code
> bases (
> > > IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>).
> Since
> > > our
> > > patch is based on cdh-5.7.3-release which was released one year ago,
> > > there’re lots of work to merge it.
> > >
> > >
> > > One of the biggest changes from cdh-5.7.3-release I notice is the new
> scan
> > > node & scanner model introduced in IMPALA-3902
> > > <https://issues.apache.org/jira/browse/IMPALA-3902>. I think it’s
> inspired
> > > by the investigating task in IMPALA-2849
> > > <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot
> find any
> > > performance report in this issue. Could you share some report about
> this
> > > multi-thread refactor?
> > >
> > >
> > > I’m wondering how much this can improve the performance, since the old
> > > single thread scan node & multi-thread scanners model has supplied
> > > concurrent IO for reading, and most of the queries in OLAP are IO
> bound.
> > >
> > >
> > > Thanks,
> > >
> > > Quanlong
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message