hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Teja Chilukuri <>
Subject Hive on Tez: Tez taking nX more containers than Mapreduce for union all
Date Thu, 16 Mar 2017 09:08:45 GMT

We are migrating our hive queries from Mapreduce to Tez .
We are using a query with union all and groupby and same table is read
multiple times in the union all subquery.
We have noticed a issue with tez here, it runs with kX times more tasks
than MR where k is the number of union alls in the query.

When run with Mapreduce, the job is run in one stage consuming *n* mappers
and *m* reducers and all *union all* scans are done with the same job.

But when it runs with tez, a map vertex is launched for each union all and
each vertex has *n* tasks.
Hence if there are 50 union alls in a query, the 50n map vertex tasks are
launched which is huge.

So running this query with tez is occupying so many containers when
compared to Mapreduce and we have hit a roadblock for the union queries
with tez.

Any help in this regard is appreciated.

Sample query:

Thanks in advance,

View raw message