hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viral Bajaria <>
Subject Re: issue with hive wide tables/views
Date Mon, 01 Dec 2014 06:44:21 GMT
Any help will be appreciate here.

This issue becomes a bigger pain when you have a VIEW referencing another
VIEW(s) which have 1000s of columns.

It seems the generation of the query plan has some un-optimized code path
when there are 1000s of columns.

A jstack of a running process ( > 30 minutes ) shows this:

I ran jstack multiple times on the running process and everytime the stack
trace of the SemanticAnalyzer propped up with the same results, hence I am
guessing that the underlying issue could be in there.

Let me know if any more details are needed to get any help on this. Will it
benefit if I reached out to the dev list for this ?


On Wed, Nov 26, 2014 at 11:21 AM, Viral Bajaria <>

> Hi,
> I have a table which ended up having 3K+ columns. The building of the
> table wasn't that painful, but the part where things suck is when creating
> VIEWs on top of that table.
> 1 of the views that I want to create needs complex operation and
> references a ton of columns or almost all of the columns.
> When applying this view to hive, it takes over 25 minutes for the view
> definition to get applied. Acceptable if the view didn't need frequent
> updates, but not acceptable if we plan to change the view often or have
> multiple such views.
> So the questions:
> 1) Should it take so long for hive to create a view that has so many
> columns ? If not, should we open a JIRA and investigate this issue ?
> 2) The underlying tables are CSV (raw data) or ORC (after some
> processing)... would we benefit if we change it from 3K+ columns to a
> single column containing List<Object> column or Map<String, Object> for all
> the values and then use the required columns
> We are on Hive 0.13.0 and our metastore is backed by MariaDB 10
> Thanks,
> Viral

View raw message