hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Teja Chilukuri <raviort...@gmail.com>
Subject Hive on Tez: Tez taking nX more containers than Mapreduce for union all
Date Thu, 16 Mar 2017 09:08:45 GMT
Hi,

We are migrating our hive queries from Mapreduce to Tez .
We are using a query with union all and groupby and same table is read
multiple times in the union all subquery.
We have noticed a issue with tez here, it runs with kX times more tasks
than MR where k is the number of union alls in the query.


When run with Mapreduce, the job is run in one stage consuming *n* mappers
and *m* reducers and all *union all* scans are done with the same job.

But when it runs with tez, a map vertex is launched for each union all and
each vertex has *n* tasks.
Hence if there are 50 union alls in a query, the 50n map vertex tasks are
launched which is huge.

So running this query with tez is occupying so many containers when
compared to Mapreduce and we have hit a roadblock for the union queries
with tez.

Any help in this regard is appreciated.


Sample query:
http://pastebin.com/u7Rw6Hag

Thanks in advance,
Ravi

Mime
View raw message