impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Behm <alex.b...@cloudera.com>
Subject Re: Debugging Impala query that consistently hangs
Date Tue, 13 Feb 2018 18:03:39 GMT
If you are willing to share, it would be great get some details on why
compute stats failed.

On Tue, Feb 13, 2018 at 7:57 AM, Piyush Narang <p.narang@criteo.com> wrote:

> Thanks Mostafa, that did help. I was able to compute stats on
> bi_dim_campaign and advertiser_event_rich. I updated the row counts
> manually for the arbitrage table. I do see the query completing
> successfully now. Will check in on the stats issue over the next couple of
> days for the bigger arbitrage table. The last time we ran into this we
> didn’t see any useful logs on the Hive server side. Our Hive team has
> updated Hive since then so I’ll try and try this out again / investigate it
> a bit more.
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Mostafa Mokhtar <mmokhtar@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <user@impala.apache.org>
> *Date: *Monday, February 12, 2018 at 5:11 PM
> *To: *"user@impala.apache.org" <user@impala.apache.org>
> *Subject: *Re: Debugging Impala query that consistently hangs
>
>
>
> Hi Piyush,
>
>
>
> Please run the "alter table" commands below to set the number of rows per
> table, doing that should yield better plans that what you currently have
> and avoid building a hash table that consumes lots of memory.
>
>
>
> You can also set the number of distinct values for the columns involved in
> join, aggregations and predicates.
>
>
>
> None of this manual work is needed if "compute stats foo" is run
> successfully.
>
>
>
> alter table bi_dim_campaign set tblproperties ('numRows'='213');
>
> alter table advertiser_event_rich set tblproperties ('numRows'='177000');
>
> alter table bi_arbitrage_full set tblproperties ('bi_arbitrage_full'='
> 2170000000');
>
>
>
> https://www.cloudera.com/documentation/enterprise/5-10-
> x/topics/impala_perf_stats.html#perf_table_stats_manual
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cloudera.com_documentation_enterprise_5-2D10-2Dx_topics_impala-5Fperf-5Fstats.html-23perf-5Ftable-5Fstats-5Fmanual&d=DwMGaQ&c=nxfEpP1JWHVKAq835DW4mA&r=3Ka-O_qIfLiCDaGELmIN3BcChZatNdPOwe36odQXFYo&m=PXbtfShc2kbIinJiXsC_-oGofEjSbuisWx3h4RcmMhc&s=fimpxYlR-Ji85GRCRBZj7v8QcsKquSvOOfBP_-svygA&e=>
>
> https://www.cloudera.com/documentation/enterprise/5-10-
> x/topics/impala_perf_stats.html#perf_column_stats_manual
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cloudera.com_documentation_enterprise_5-2D10-2Dx_topics_impala-5Fperf-5Fstats.html-23perf-5Fcolumn-5Fstats-5Fmanual&d=DwMGaQ&c=nxfEpP1JWHVKAq835DW4mA&r=3Ka-O_qIfLiCDaGELmIN3BcChZatNdPOwe36odQXFYo&m=PXbtfShc2kbIinJiXsC_-oGofEjSbuisWx3h4RcmMhc&s=3HNloA8f-indt0L3ZJ1D6ZaOlDxvXojgAKhabrBh6fQ&e=>
>
>
>
> P.S In tables with a large number of partitions I have seen HMS to hit
> various scalability limitations in JVM and the backing store.
>
>
>
>
>
>
>
> On Mon, Feb 12, 2018 at 6:55 PM, Tim Armstrong <tarmstrong@cloudera.com>
> wrote:
>
> Let us know if we can help figuring out what went wrong with compute stats.
>
>
>
> - Tim
>
>
>
> On Mon, Feb 12, 2018 at 6:07 AM, Piyush Narang <p.narang@criteo.com>
> wrote:
>
> Got it, thanks for the explanation Tim. I’ll chase into the issue with
> compute stats for that table.
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Tim Armstrong <tarmstrong@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <user@impala.apache.org>
> *Date: *Sunday, February 11, 2018 at 2:31 PM
>
>
> *To: *"user@impala.apache.org" <user@impala.apache.org>
> *Subject: *Re: Debugging Impala query that consistently hangs
>
>
>
> Piyush,
>
>
>
>   I can't recommend in strong enough terms that you figure out how to get
> compute stats working. You will not have a good experience with Impala
> without statistics - there's no way you will get good plans for all your
> queries.
>
>
>
> - Tim
>
>
>
> On Fri, Feb 9, 2018 at 11:25 AM, Piyush Narang <p.narang@criteo.com>
> wrote:
>
> Thanks Tim. I had issues running compute stats on some of our tables
> (calling alter table on Hive was failing and I wasn’t able to resolve it)
> and I think this was one of them. I’ll try switching over to a shuffle join
> and see if that helps.
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Tim Armstrong <tarmstrong@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <user@impala.apache.org>
> *Date: *Friday, February 9, 2018 at 12:24 PM
>
>
> *To: *"user@impala.apache.org" <user@impala.apache.org>
> *Subject: *Re: Debugging Impala query that consistently hangs
>
>
>
> I suspect it's busy building the hash tables in the join with id=7. If you
> drill down into the profile I suspect you'll see a bunch of time spent
> there. The top-level time counter isn't necessarily updated live for the
> time spent building the hash tables, but the fact it's using 179GB of
> memory is a big hint that it's building some big hash tables.
>
>
>
> The plan you're getting is really terrible btw. That join has > 2B rows on
> the right side and 0 rows on the left side, which is the exact opposite of
> what you what.
>
>
>
> I'd suggest running compute stats on the input tables to get a better
> plan. I suspect that will solve your problem.
>
>
>
> On Thu, Feb 8, 2018 at 12:06 PM, Piyush Narang <p.narang@criteo.com>
> wrote:
>
> Yeah like I mentioned, the summary tab isn’t getting updated either:
>
> Operator             #Hosts   Avg Time   Max Time    #Rows  Est. #Rows
> Peak Mem  Est. Peak Mem  Detail
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ----------------------------------------------------------------------
>
> 22:AGGREGATE              1  142.755us  142.755us        0           1
> 4.00 KB       10.00 MB  FINALIZE
>
>
>
>
>
>
>
>
>
>
> 21:EXCHANGE               1    0.000ns    0.000ns        0
> 1          0              0  UNPARTITIONED
>
>
>
>
>
>
>
>
>
>
> 10:AGGREGATE              1  126.704us  126.704us        0           1
> 4.00 KB       10.00 MB
>
>
>
>
>
>
>
>
>
>
> 20:AGGREGATE              1  312.493us  312.493us        0          -1
> 119.12 KB      128.00 MB  FINALIZE
>
>
>
>
>
>
>
>
>
>
> 19:EXCHANGE               1    0.000ns    0.000ns        0
> -1          0              0  HASH(day,country,…)
>
> 09:AGGREGATE              1  216.614us  216.614us        0          -1
> 119.12 KB      128.00 MB  STREAMING
>
>
>
>
>
>
>
>
>
>
> 18:AGGREGATE              1  357.918us  357.918us        0          -1
> 170.12 KB      128.00 MB  FINALIZE
>
>
>
>
>
>
>
>
>
>
> 17:EXCHANGE               1    0.000ns    0.000ns        0
> -1          0              0  HASH(…)
>
>
>
>
>
>
> 08:AGGREGATE              1   27.985us   27.985us        0          -1
> 170.12 KB      128.00 MB  STREAMING
>
>
>
>
>
>
>
>
>
>
> 07:HASH JOIN              1    0.000ns    0.000ns        0          -1
> 179.72 GB        2.00 GB  LEFT OUTER JOIN, PARTITIONED
>
>
>
>
>
>
>
>
>
>
> |--16:EXCHANGE            1      6m47s      6m47s    2.17B
> -1          0              0  HASH(user_id)
>
>
>
>
>
>
>
>
>
>
> |  05:HASH JOIN          22    8s927ms   14s258ms    2.17B          -1
> 68.07 MB        2.00 GB  LEFT OUTER JOIN, BROADCAST
>
>
>
>
>
>
>
>
>
>
> |  |--14:EXCHANGE        22   11s626ms   11s844ms    1.08M
> -1          0              0  BROADCAST
>
>
>
>
>
>
>
>
>
>
> |  |  04:SCAN HDFS        2  103.838ms  138.573ms    1.08M          -1
> 10.48 MB       96.00 MB  dim_publisher pub
>
>
>
>
>
>
>
>
>
>
> |  03:SCAN HDFS          22      8m40s      10m9s    2.17B          -1
> 1.03 GB      616.00 MB  bi_arbitrage_full a
>
>
>
>
>
>
>
>
>
>
> 15:EXCHANGE               1   22s489ms   22s489ms        0
> -1          0              0  HASH(uid)
>
>
>
>
>
>
>
>
>
>
> 06:HASH JOIN              1   51.613ms   51.613ms   88.70K          -1
> 46.04 MB        2.00 GB  INNER JOIN, BROADCAST
>
>
>
>
>
>
>
>
>
>
> |--13:EXCHANGE            1   22s928ms   22s928ms  177.30K
> -1          0              0  BROADCAST
>
>
>
>
>
>
>
>
>
>
> |  02:SCAN HDFS          22   14s311ms   21s235ms  177.30K          -1
> 798.47 MB      440.00 MB  advertiser_event_rich
>
>
>
>
>
>
>
>
>
>
> 12:AGGREGATE              1    7.971ms    7.971ms       36          -1
> 36.18 MB      128.00 MB  FINALIZE
>
>
>
>
>
>
>
>
>
>
> 11:EXCHANGE               1    1s892ms    1s892ms       56
> -1          0              0  HASH(..)
>
>
>
>
>
>
>
>
>
>
> 01:AGGREGATE              1    0.000ns    0.000ns       56          -1
> 35.73 MB      128.00 MB  STREAMING
>
>
>
>
>
>
>
>
>
>
> 00:SCAN HDFS              1    2s012ms    2s012ms      213          -1
> 3.34 MB      128.00 MB  bi_dim_campaign
>
>
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Jeszy <jeszyb@gmail.com>
> *Reply-To: *"user@impala.apache.org" <user@impala.apache.org>
> *Date: *Thursday, February 8, 2018 at 2:59 PM
> *To: *"user@impala.apache.org" <user@impala.apache.org>
> *Subject: *Re: Debugging Impala query that consistently hangs
>
>
>
> Not sure that's what you're referring to, but scan progress isn't
> necessarily indicative of overall query progress. Can you attach the text
> profile of the cancelled query?
>
> If you cannot upload attachments, the Summary section is the best starting
> point, so please include that.
>
>
>
> On 8 February 2018 at 20:53, Piyush Narang <p.narang@criteo.com> wrote:
>
> Hi folks,
>
>
>
> I have a query that I’m running on Impala that seems to consistently stop
> making progress after reaching 45-50%. It stays at that split number for a
> couple of hours (before I cancel it).  I don’t see any progress on the
> summary page either. I’m running 2.11.0-cdh5.14.0 RELEASE (build
> d68206561bce6b26762d62c01a78e6cd27aa7690). It seems to not be making
> progress from an exchange hash step.
>
> Has anyone run into this in the past? Any suggestions on what’s the best
> way to debug this? (I could take stack dumps on the coordinator / workers,
> but not sure if there’s any other way).
>
>
>
> Thanks,
>
>
>
> -- Piyush
>
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message