Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7110D160C05 for ; Wed, 3 Jan 2018 20:14:17 +0100 (CET) Received: (qmail 40791 invoked by uid 500); 3 Jan 2018 19:14:16 -0000 Mailing-List: contact user-help@impala.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@impala.apache.org Delivered-To: mailing list user@impala.apache.org Received: (qmail 40781 invoked by uid 99); 3 Jan 2018 19:14:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jan 2018 19:14:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 2A522180582 for ; Wed, 3 Jan 2018 19:14:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ovfZ9lTA7tHG for ; Wed, 3 Jan 2018 19:14:14 +0000 (UTC) Received: from mail-it0-f49.google.com (mail-it0-f49.google.com [209.85.214.49]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 8CABC5FAC9 for ; Wed, 3 Jan 2018 19:14:13 +0000 (UTC) Received: by mail-it0-f49.google.com with SMTP id b5so2809893itc.3 for ; Wed, 03 Jan 2018 11:14:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Ms8OmvnFiw6gLdy/tUO0qCznO9j7dJuAGOUjB3vrwVY=; b=FCfxKnxGOZxVgVo9U2Zb2N9rPQG7Rn+5k7/XfP36B8txoMscBxS/DFWmXZ+N9W5Zj5 wYhOCCM9UQJOvB70MqnNjgQNG/pX+yxeI9h29fnqpbo/PD/dBeY3HdNdpXuUrtk99bWP VKAjyFi3mhryPdbCfc60xc0J+HF4kZBW8uu5PTQO7PfZb2Nt1VFAyV06QsyIu7AtDUVA SI0shy1qO1YHoRqcTHfvLYRZ9DgzijC1KW+EUtvTbHKy32+r1kVnh5w2w57r5WCPIxB5 g+Nfcc+XcrBO1eVZVty//Lh4EXgdhkw72ZG2fCFMBJ6BmBDz1b9VzlYKbIjxPnKIupJk z/0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Ms8OmvnFiw6gLdy/tUO0qCznO9j7dJuAGOUjB3vrwVY=; b=kj6mkQnZhCinF4VxhJYIdsUp6vsb9367OiofkjfovlQ5eO9Gv0HHk0HpmrfMae4/gd q7NzGp0a7XiKY6iRZ9OX0TBImZo5F9J+xoACVPvxDy/6uQBfUXUy6xs4BQAjfcYQGQLt vkqichdSgNVIhaH4VYuRQ0pUgXeOJzlAIA3+UScvpIcpdlxI3S/w/KZmftQwEHDE3zJx Hud/gHkC21nmQWbXkTXJmfQV+TRb2tnsImEJ3ncFHiu0emA7Q4w6r96WAyA/ByMTikOv lVwKqNP8fJLOOGAIIValvCyvpxAfxFyFtwQ9Z/4zf3PElLUN8upRnMG/pi6kuPi2/Q3l o9KA== X-Gm-Message-State: AKGB3mJk4ADuRHqTHDsSg+R1BcMafrUoC7vbHFcLI7XWqTBn/DJ4ryTC t/pzf3FwvzskKGwlXQ95iJ03Rq00zoSNU0H3HNpPZRQW X-Google-Smtp-Source: ACJfBosh+Vgrhvh3x7P6ZL+TP9SO/8qDr5h0VkwZ0y/4A0gKsyoDpHvHQ8Q4bIwIhtmHgjf06wvdji2rPfYLciUznbQ= X-Received: by 10.36.17.15 with SMTP id 15mr3033566itf.53.1515006851897; Wed, 03 Jan 2018 11:14:11 -0800 (PST) MIME-Version: 1.0 Received: by 10.79.147.209 with HTTP; Wed, 3 Jan 2018 11:13:31 -0800 (PST) In-Reply-To: <11C4A403-F4D9-4101-8319-7EA82DE6F4BA@criteo.com> References: <11C4A403-F4D9-4101-8319-7EA82DE6F4BA@criteo.com> From: Mostafa Mokhtar Date: Wed, 3 Jan 2018 11:13:31 -0800 Message-ID: Subject: Re: Issues running compute incremental stats in Impala - alter not possible To: user@impala.apache.org Content-Type: multipart/alternative; boundary="001a11445d683d7f330561e405ff" archived-at: Wed, 03 Jan 2018 19:14:18 -0000 --001a11445d683d7f330561e405ff Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable When you looked at the Web-UI while compute stats was running what did you find? In general compute stats has 3 phases 1. Get row count per partition using something like select count(*), partition_id from foo group by partition_id 2. SELECT NDV(C1), Max size (C1), Avg size (C1), ... NDV(CN), Max size (CN), Avg size (CN) from foo 3. Persist the captured stats in Hive Meta store For steps #1 and #2 you should be able to get a good read on progress using the "Scan Progress" column. Step 3 if you tail /var/log/catalogd/catalogd.INFO you should see the progress there. And if the table is large in terms of on disk size I expect steps #2 to dominate the time. If the operation is CPU not IO bound increasing mt_dop should give you good speedup, yet I don't recommend a value greater than 16. On Wed, Jan 3, 2018 at 11:03 AM, Piyush Narang wrote: > Thanks Alex and Mostafa. I tried running compute stats full a couple of > weeks back on this table and it was still going 4 hours later (and I didn= =E2=80=99t > see any progress indication on the Impala web UI). I=E2=80=99ll try and h= unt > through the Hive metastore log files and see if I can find anything. > > > > Is there something specific you=E2=80=99re looking for in the show create= table > output? I can dump that here (rather than the full table definition and > details which is pretty verbose and I might need to check if it=E2=80=99s= ok to > share externally). > > > > -- Piyush > > > > > > *From: *Mostafa Mokhtar > *Reply-To: *"user@impala.apache.org" > *Date: *Wednesday, January 3, 2018 at 1:48 PM > *To: *"user@impala.apache.org" > *Subject: *Re: Issues running compute incremental stats in Impala - alter > not possible > > > > Also check the Hive Metastore log files. > > > > In general if the table has a large number of partitions incremental stat= s > will have very large overhead in terms of metadata. > > > > I would recommend running "compute stats bi_ full" then manually set the > row count for newly added partitions whenever possible. > > > > On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm > wrote: > > Thanks for the report. I have not seen this issue. Looks like the alter > RPC is rejected by the Hive Metastore. Maybe looking into the > Hive/Metastore logs would help. > > > > The SHOW CREATE TABLE output might also help us debug. > > > > On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang > wrote: > > Hi folks, > > > > I=E2=80=99m running into some issues when I try to compute incremental st= ats in > Impala that I was hoping someone would be able to help with. I=E2=80=99m = able to > =E2=80=98compute stats=E2=80=99 in Impala on my smaller tables just fine.= When I try > computing stats incrementally for one of my larger tables, I seem to be > running into this error: > > > compute incremental stats bi_ full partition (param1=3D0,day=3D'2017-10= -04', > hour=3D00,host_platform=3D'EU'); > > Query: compute incremental stats bi_full partition > (param1=3D0,day=3D'2017-10-04',hour=3D00,host_platform=3D'EU') > > WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to > Hive Metastore: > > CAUSED BY: InvalidOperationException: alter is not possible > > > > Looking at impalad.INFO and catalogd.INFO I don=E2=80=99t see any additio= nal > details. I verified that I=E2=80=99m the owner of the tables in HDFS. > > > > Has anyone run into this issue in the past? Any workarounds? > > > > Thanks, > > > > -- Piyush > > > > > > > --001a11445d683d7f330561e405ff Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
When you looked at the Web-UI while compute stats was= running what did you find?

In general compute sta= ts has 3 phases
  1. Get row count per partition using somethi= ng like select count(*), partition_id from foo group by partition_id
  2. SELECT NDV(C1), Max size (C1), Avg size (C1), ... NDV(CN), Max size (CN),= Avg size (CN) from foo
  3. Persist the captured stats in Hive Meta= store=C2=A0

For steps #1 and #2 you sho= uld be able to get a good read on progress using the "Scan Progress&qu= ot; column.
Step 3 if you tail=C2=A0/var/log/catalogd/catalogd.IN= FO you should see the progress there.=C2=A0

And if= the table is large in terms of on disk size I expect steps #2 to dominate = the time.

If the operation is CPU not IO bound inc= reasing mt_dop should give you good speedup, yet I don't recommend a va= lue greater than 16.=C2=A0


On Wed, Jan 3, 2018 at 11:03 AM, Piyush= Narang <p.narang@criteo.com> wrote:

Thanks Alex and Mostafa. I tried running compute stats full a couple of we= eks back on this table and it was still going 4 hours later (and I didn=E2= =80=99t see any progress indication on the Impala web UI). I=E2=80=99ll try and hunt through the Hive metastore log files an= d see if I can find anything.

=C2=A0

Is there something specific you=E2=80=99re looking for in the show create = table output? I can dump that here (rather than the full table definition a= nd details which is pretty verbose and I might need to check if it=E2=80=99s ok to share externally).

=C2=A0

--=C2=A0Piyush

=C2=A0

=C2=A0

F= rom: Mostafa Mokhtar <mmokhtar@cloudera.c= om>
Reply-To: "user@impala.apache.org" <user@impala.apache.org>
Date: Wednesday, January 3, 2018 at 1:48 PM
To: "user@impala.apache.org" <user@impala.apache.org>
Subject: Re: Issues running compute incremental stats in Impala - al= ter not possible

=C2=A0

Also check the Hive Metastore log files. <= /u>

=C2=A0

In general if the table has a large number of partit= ions incremental stats will have very large overhead in terms of metadata.<= u>

=C2=A0

I would recommend running "compute stats bi_ full" then manually set the row count f= or newly added partitions whenever possible.

=C2=A0

On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <= alex.behm@cloud= era.com> wrote:

Thanks for the report. I have not seen this issue. L= ooks like the alter RPC is rejected by the Hive Metastore. Maybe looking in= to the Hive/Metastore logs would help.

=C2=A0

The SHOW CREATE TABLE output might also help us debu= g.

=C2=A0

On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <<= a href=3D"mailto:p.narang@criteo.com" target=3D"_blank">p.narang@criteo.com= > wrote:

Hi folks,=

=C2=A0

I=E2=80=99m running= into some issues when I try to compute incremental stats in Impala that I = was hoping someone would be able to help with. I=E2=80=99m able to =E2=80= =98compute stats=E2=80=99 in Impala on my smaller tables just fine. When I try comput= ing stats incrementally for one of my larger tables, I seem to be running i= nto this error:

> compute increm= ental stats bi_ full partition (param1=3D0,day=3D'2017-10-04',= hour=3D00,host_platform=3D'EU');

Query: compute incr= emental stats bi_full partition (param1=3D0,day=3D'2017-10-04',hour=3D00,host_platform=3D'EU')

WARNINGS: ImpalaRun= timeException: Error making 'alter_partitions' RPC to Hive Metastor= e:

CAUSED BY: InvalidO= perationException: alter is not possible

=C2=A0

Looking at impalad.= INFO and catalogd.INFO I don=E2=80=99t see any additional details. I verifi= ed that I=E2=80=99m the owner of the tables in HDFS.

=C2=A0

Has anyone run into= this issue in the past? Any workarounds?

=C2=A0

Thanks,

=C2= =A0

--=C2=A0Piyush<= u>

=C2=A0

=C2=A0

=C2=A0


--001a11445d683d7f330561e405ff--