Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA8FF10EA8 for ; Mon, 1 Dec 2014 06:44:47 +0000 (UTC) Received: (qmail 87950 invoked by uid 500); 1 Dec 2014 06:44:46 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 87871 invoked by uid 500); 1 Dec 2014 06:44:46 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 87860 invoked by uid 99); 1 Dec 2014 06:44:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Dec 2014 06:44:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of viral.bajaria@gmail.com designates 209.85.213.170 as permitted sender) Received: from [209.85.213.170] (HELO mail-ig0-f170.google.com) (209.85.213.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Dec 2014 06:44:42 +0000 Received: by mail-ig0-f170.google.com with SMTP id r2so14015658igi.3 for ; Sun, 30 Nov 2014 22:44:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=TuLSPrzCUOnOGWnCe3eNS3xUjurzG4EWKlRpM25UQ1E=; b=BC1CUCSHJ8HXSLX/d6TX2JBU1wG7URahoVRkY+fWt5cGhm/IdQCo6JBlVQCBoxCq4G F6I49AK0ENsNHdOnyrAkAqh4FUnO+4zhkFFSiv7G2+tKPfCY+d5d4U6XZK37P2j26aIc UgFBYrNd22JS0T8dpIK/U58a5f7KaRWEU2ZhAecn4/XGY2n4tBAaXnMoai+aBdd65Pu9 ltrqvO6aQXA2SutMaTuO4uTBAy8vVLUgwPL2wPkkLkSu1jF/0AtX5X12ajghUz9LaaWm Ux/h0vxo6O4TtweNE34WpJJmkgJB7g7P1Y8OgFPkpr1fdYrv8shUFuW2IKwgjhFEpnzL SISg== MIME-Version: 1.0 X-Received: by 10.107.131.133 with SMTP id n5mr49165990ioi.30.1417416261395; Sun, 30 Nov 2014 22:44:21 -0800 (PST) Received: by 10.107.134.168 with HTTP; Sun, 30 Nov 2014 22:44:21 -0800 (PST) In-Reply-To: References: Date: Sun, 30 Nov 2014 22:44:21 -0800 Message-ID: Subject: Re: issue with hive wide tables/views From: Viral Bajaria To: "user@hive.apache.org" Content-Type: multipart/alternative; boundary=001a113ebbf8c26d90050921f0eb X-Virus-Checked: Checked by ClamAV on apache.org --001a113ebbf8c26d90050921f0eb Content-Type: text/plain; charset=UTF-8 Any help will be appreciate here. This issue becomes a bigger pain when you have a VIEW referencing another VIEW(s) which have 1000s of columns. It seems the generation of the query plan has some un-optimized code path when there are 1000s of columns. A jstack of a running process ( > 30 minutes ) shows this: https://gist.github.com/vbajaria/2b46eb015eb5f97954fc I ran jstack multiple times on the running process and everytime the stack trace of the SemanticAnalyzer propped up with the same results, hence I am guessing that the underlying issue could be in there. Let me know if any more details are needed to get any help on this. Will it benefit if I reached out to the dev list for this ? Thanks, Viral On Wed, Nov 26, 2014 at 11:21 AM, Viral Bajaria wrote: > Hi, > > I have a table which ended up having 3K+ columns. The building of the > table wasn't that painful, but the part where things suck is when creating > VIEWs on top of that table. > > 1 of the views that I want to create needs complex operation and > references a ton of columns or almost all of the columns. > > When applying this view to hive, it takes over 25 minutes for the view > definition to get applied. Acceptable if the view didn't need frequent > updates, but not acceptable if we plan to change the view often or have > multiple such views. > > So the questions: > 1) Should it take so long for hive to create a view that has so many > columns ? If not, should we open a JIRA and investigate this issue ? > 2) The underlying tables are CSV (raw data) or ORC (after some > processing)... would we benefit if we change it from 3K+ columns to a > single column containing List column or Map for all > the values and then use the required columns > > We are on Hive 0.13.0 and our metastore is backed by MariaDB 10 > > Thanks, > Viral > > --001a113ebbf8c26d90050921f0eb Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Any help will be appreciate here.

This = issue becomes a bigger pain when you have a VIEW referencing another VIEW(s= ) which have 1000s of columns.

It seems the genera= tion of the query plan has some un-optimized code path when there are 1000s= of columns.

A jstack of a running process ( > = 30 minutes ) shows this:=C2=A0https://gist.github.com/vbajaria/2b46eb015eb5f97954fc<= /a>





On Wed, Nov 26, 2014 at 11:21 AM, Viral Bajaria <viral.= bajaria@gmail.com> wrote:
<= div dir=3D"ltr">Hi,

I have a table which ended up having= 3K+ columns. The building of the table wasn't that painful, but the pa= rt where things suck is when creating VIEWs on top of that table.

1 of the views that I want to create needs complex operatio= n and references a ton of columns or almost all of the columns.
<= br>
When applying this view to hive, it takes over 25 minutes for= the view definition to get applied. Acceptable if the view didn't need= frequent updates, but not acceptable if we plan to change the view often o= r have multiple such views.

So the questions:
1) Should it take so long for hive to create a view that has so many = columns ? If not, should we open a JIRA and investigate this issue ?
<= div>2) The underlying tables are CSV (raw data) or ORC (after some processi= ng)... would we benefit if we change it from 3K+ columns to a single column= containing List<Object> column or Map<String, Object> for all = the values and then use the required columns=C2=A0

We are on Hive 0.13.0 and our metastore is backed by MariaDB 10
=
Thanks,
Viral


--001a113ebbf8c26d90050921f0eb--