impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <tarmstr...@cloudera.com>
Subject Re: Debugging Impala query that consistently hangs
Date Mon, 12 Feb 2018 16:55:47 GMT
Let us know if we can help figuring out what went wrong with compute stats.

- Tim

On Mon, Feb 12, 2018 at 6:07 AM, Piyush Narang <p.narang@criteo.com> wrote:

> Got it, thanks for the explanation Tim. I’ll chase into the issue with
> compute stats for that table.
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Tim Armstrong <tarmstrong@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <user@impala.apache.org>
> *Date: *Sunday, February 11, 2018 at 2:31 PM
>
> *To: *"user@impala.apache.org" <user@impala.apache.org>
> *Subject: *Re: Debugging Impala query that consistently hangs
>
>
>
> Piyush,
>
>
>
>   I can't recommend in strong enough terms that you figure out how to get
> compute stats working. You will not have a good experience with Impala
> without statistics - there's no way you will get good plans for all your
> queries.
>
>
>
> - Tim
>
>
>
> On Fri, Feb 9, 2018 at 11:25 AM, Piyush Narang <p.narang@criteo.com>
> wrote:
>
> Thanks Tim. I had issues running compute stats on some of our tables
> (calling alter table on Hive was failing and I wasn’t able to resolve it)
> and I think this was one of them. I’ll try switching over to a shuffle join
> and see if that helps.
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Tim Armstrong <tarmstrong@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <user@impala.apache.org>
> *Date: *Friday, February 9, 2018 at 12:24 PM
>
>
> *To: *"user@impala.apache.org" <user@impala.apache.org>
> *Subject: *Re: Debugging Impala query that consistently hangs
>
>
>
> I suspect it's busy building the hash tables in the join with id=7. If you
> drill down into the profile I suspect you'll see a bunch of time spent
> there. The top-level time counter isn't necessarily updated live for the
> time spent building the hash tables, but the fact it's using 179GB of
> memory is a big hint that it's building some big hash tables.
>
>
>
> The plan you're getting is really terrible btw. That join has > 2B rows on
> the right side and 0 rows on the left side, which is the exact opposite of
> what you what.
>
>
>
> I'd suggest running compute stats on the input tables to get a better
> plan. I suspect that will solve your problem.
>
>
>
> On Thu, Feb 8, 2018 at 12:06 PM, Piyush Narang <p.narang@criteo.com>
> wrote:
>
> Yeah like I mentioned, the summary tab isn’t getting updated either:
>
> Operator             #Hosts   Avg Time   Max Time    #Rows  Est. #Rows
> Peak Mem  Est. Peak Mem  Detail
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ----------------------------------------------------------------------
>
> 22:AGGREGATE              1  142.755us  142.755us        0           1
> 4.00 KB       10.00 MB  FINALIZE
>
>
>
>
>
>
>
>
>
>
> 21:EXCHANGE               1    0.000ns    0.000ns        0
> 1          0              0  UNPARTITIONED
>
>
>
>
>
>
>
>
>
>
> 10:AGGREGATE              1  126.704us  126.704us        0           1
> 4.00 KB       10.00 MB
>
>
>
>
>
>
>
>
>
>
> 20:AGGREGATE              1  312.493us  312.493us        0          -1
> 119.12 KB      128.00 MB  FINALIZE
>
>
>
>
>
>
>
>
>
>
> 19:EXCHANGE               1    0.000ns    0.000ns        0
> -1          0              0  HASH(day,country,…)
>
> 09:AGGREGATE              1  216.614us  216.614us        0          -1
> 119.12 KB      128.00 MB  STREAMING
>
>
>
>
>
>
>
>
>
>
> 18:AGGREGATE              1  357.918us  357.918us        0          -1
> 170.12 KB      128.00 MB  FINALIZE
>
>
>
>
>
>
>
>
>
>
> 17:EXCHANGE               1    0.000ns    0.000ns        0
> -1          0              0  HASH(…)
>
>
>
>
>
>
> 08:AGGREGATE              1   27.985us   27.985us        0          -1
> 170.12 KB      128.00 MB  STREAMING
>
>
>
>
>
>
>
>
>
>
> 07:HASH JOIN              1    0.000ns    0.000ns        0          -1
> 179.72 GB        2.00 GB  LEFT OUTER JOIN, PARTITIONED
>
>
>
>
>
>
>
>
>
>
> |--16:EXCHANGE            1      6m47s      6m47s    2.17B
> -1          0              0  HASH(user_id)
>
>
>
>
>
>
>
>
>
>
> |  05:HASH JOIN          22    8s927ms   14s258ms    2.17B          -1
> 68.07 MB        2.00 GB  LEFT OUTER JOIN, BROADCAST
>
>
>
>
>
>
>
>
>
>
> |  |--14:EXCHANGE        22   11s626ms   11s844ms    1.08M
> -1          0              0  BROADCAST
>
>
>
>
>
>
>
>
>
>
> |  |  04:SCAN HDFS        2  103.838ms  138.573ms    1.08M          -1
> 10.48 MB       96.00 MB  dim_publisher pub
>
>
>
>
>
>
>
>
>
>
> |  03:SCAN HDFS          22      8m40s      10m9s    2.17B          -1
> 1.03 GB      616.00 MB  bi_arbitrage_full a
>
>
>
>
>
>
>
>
>
>
> 15:EXCHANGE               1   22s489ms   22s489ms        0
> -1          0              0  HASH(uid)
>
>
>
>
>
>
>
>
>
>
> 06:HASH JOIN              1   51.613ms   51.613ms   88.70K          -1
> 46.04 MB        2.00 GB  INNER JOIN, BROADCAST
>
>
>
>
>
>
>
>
>
>
> |--13:EXCHANGE            1   22s928ms   22s928ms  177.30K
> -1          0              0  BROADCAST
>
>
>
>
>
>
>
>
>
>
> |  02:SCAN HDFS          22   14s311ms   21s235ms  177.30K          -1
> 798.47 MB      440.00 MB  advertiser_event_rich
>
>
>
>
>
>
>
>
>
>
> 12:AGGREGATE              1    7.971ms    7.971ms       36          -1
> 36.18 MB      128.00 MB  FINALIZE
>
>
>
>
>
>
>
>
>
>
> 11:EXCHANGE               1    1s892ms    1s892ms       56
> -1          0              0  HASH(..)
>
>
>
>
>
>
>
>
>
>
> 01:AGGREGATE              1    0.000ns    0.000ns       56          -1
> 35.73 MB      128.00 MB  STREAMING
>
>
>
>
>
>
>
>
>
>
> 00:SCAN HDFS              1    2s012ms    2s012ms      213          -1
> 3.34 MB      128.00 MB  bi_dim_campaign
>
>
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Jeszy <jeszyb@gmail.com>
> *Reply-To: *"user@impala.apache.org" <user@impala.apache.org>
> *Date: *Thursday, February 8, 2018 at 2:59 PM
> *To: *"user@impala.apache.org" <user@impala.apache.org>
> *Subject: *Re: Debugging Impala query that consistently hangs
>
>
>
> Not sure that's what you're referring to, but scan progress isn't
> necessarily indicative of overall query progress. Can you attach the text
> profile of the cancelled query?
>
> If you cannot upload attachments, the Summary section is the best starting
> point, so please include that.
>
>
>
> On 8 February 2018 at 20:53, Piyush Narang <p.narang@criteo.com> wrote:
>
> Hi folks,
>
>
>
> I have a query that I’m running on Impala that seems to consistently stop
> making progress after reaching 45-50%. It stays at that split number for a
> couple of hours (before I cancel it).  I don’t see any progress on the
> summary page either. I’m running 2.11.0-cdh5.14.0 RELEASE (build
> d68206561bce6b26762d62c01a78e6cd27aa7690). It seems to not be making
> progress from an exchange hash step.
>
> Has anyone run into this in the past? Any suggestions on what’s the best
> way to debug this? (I could take stack dumps on the coordinator / workers,
> but not sure if there’s any other way).
>
>
>
> Thanks,
>
>
>
> -- Piyush
>
>
>
>
>
>
>
>
>

Mime
View raw message