impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mostafa Mokhtar <mmokh...@cloudera.com>
Subject Re: Issues running compute incremental stats in Impala - alter not possible
Date Wed, 03 Jan 2018 19:13:31 GMT
When you looked at the Web-UI while compute stats was running what did you
find?

In general compute stats has 3 phases

   1. Get row count per partition using something like select count(*),
   partition_id from foo group by partition_id
   2. SELECT NDV(C1), Max size (C1), Avg size (C1), ... NDV(CN), Max size
   (CN), Avg size (CN) from foo
   3. Persist the captured stats in Hive Meta store


For steps #1 and #2 you should be able to get a good read on progress using
the "Scan Progress" column.
Step 3 if you tail /var/log/catalogd/catalogd.INFO you should see the
progress there.

And if the table is large in terms of on disk size I expect steps #2 to
dominate the time.

If the operation is CPU not IO bound increasing mt_dop should give you good
speedup, yet I don't recommend a value greater than 16.


On Wed, Jan 3, 2018 at 11:03 AM, Piyush Narang <p.narang@criteo.com> wrote:

> Thanks Alex and Mostafa. I tried running compute stats full a couple of
> weeks back on this table and it was still going 4 hours later (and I didn’t
> see any progress indication on the Impala web UI). I’ll try and hunt
> through the Hive metastore log files and see if I can find anything.
>
>
>
> Is there something specific you’re looking for in the show create table
> output? I can dump that here (rather than the full table definition and
> details which is pretty verbose and I might need to check if it’s ok to
> share externally).
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Mostafa Mokhtar <mmokhtar@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <user@impala.apache.org>
> *Date: *Wednesday, January 3, 2018 at 1:48 PM
> *To: *"user@impala.apache.org" <user@impala.apache.org>
> *Subject: *Re: Issues running compute incremental stats in Impala - alter
> not possible
>
>
>
> Also check the Hive Metastore log files.
>
>
>
> In general if the table has a large number of partitions incremental stats
> will have very large overhead in terms of metadata.
>
>
>
> I would recommend running "compute stats bi_ full" then manually set the
> row count for newly added partitions whenever possible.
>
>
>
> On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <alex.behm@cloudera.com>
> wrote:
>
> Thanks for the report. I have not seen this issue. Looks like the alter
> RPC is rejected by the Hive Metastore. Maybe looking into the
> Hive/Metastore logs would help.
>
>
>
> The SHOW CREATE TABLE output might also help us debug.
>
>
>
> On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <p.narang@criteo.com>
> wrote:
>
> Hi folks,
>
>
>
> I’m running into some issues when I try to compute incremental stats in
> Impala that I was hoping someone would be able to help with. I’m able to
> ‘compute stats’ in Impala on my smaller tables just fine. When I try
> computing stats incrementally for one of my larger tables, I seem to be
> running into this error:
>
> > compute incremental stats bi_ full partition (param1=0,day='2017-10-04',
> hour=00,host_platform='EU');
>
> Query: compute incremental stats bi_full partition
> (param1=0,day='2017-10-04',hour=00,host_platform='EU')
>
> WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to
> Hive Metastore:
>
> CAUSED BY: InvalidOperationException: alter is not possible
>
>
>
> Looking at impalad.INFO and catalogd.INFO I don’t see any additional
> details. I verified that I’m the owner of the tables in HDFS.
>
>
>
> Has anyone run into this issue in the past? Any workarounds?
>
>
>
> Thanks,
>
>
>
> -- Piyush
>
>
>
>
>
>
>

Mime
View raw message