impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <tarmstr...@cloudera.com>
Subject Re: Could anybody help to explain why there is such a big gap in 'ProbeTime' and How to fix this gap?
Date Wed, 03 Jan 2018 16:20:39 GMT
That's a tricky one. I have a couple of ideas but it's a bit difficult to
confirm since the profile isn't really designed to easily answer questions
like this. ProbeTime measures wall-clock time rather than actual time spent
executing on the CPU.

My first guess is that it's because the Kudu scan is using more CPU than
the Parquet scan and ProbeTime so the thread doing the scan is competing
for resources more with the join thread (either hyperthreads competing for
resources on the same CPU or threads competing for time on logical
processors). You could compare the User CPU time for the fragment instances
containing the joins and scans to see if there is also a discrepancy in CPU
time. That won't answer this directly but might provide some clues.

My second guess is that there's some kind of subtle memory locality or
scheduling effect.

On Tue, Jan 2, 2018 at 11:40 PM, helifu <hzhelifu@corp.netease.com> wrote:

> Hi everybody,
>
>
>
> Recently I ran a simple PHJ on‘parquet’ and ‘kudu’with this sql
> independently:
>
> select count(*) from lineitem as l, orders as o where l.l_orderkey =
> o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >=
> '1995-01-01';
>
>
>
> And I found that the ‘ProbeTime’ on ‘kudu’is much larger than on
> ‘parquet’!!
>
>
>
> Below are the plans and profiles:
>
>
>
> Thanks in advance.
>
>
>
>
>
> 何李夫
>
> 2017-04-10 16:06:24
>
>
>

Mime
View raw message