hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <>
Subject [jira] [Commented] (HIVE-7158) Use Tez auto-parallelism in Hive
Date Fri, 13 Jun 2014 07:19:02 GMT


Lefty Leverenz commented on HIVE-7158:

Does the design doc need guidance about this (or is it time to add Tez documentation to the
user docs)?

* [Hive on Tez |]

At a minimum, Configuration Properties needs to document these parameters:

* new parameter:
* new parameter:  hive.tez.max.partition.factor
* new parameter:  hive.tez.min.partition.factor
* new default for [hive.exec.reducers.bytes.per.reducer |]
(with version information)
* new default for [hive.exec.reducers.max |]
(with version information)

> Use Tez auto-parallelism in Hive
> --------------------------------
>                 Key: HIVE-7158
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Gunther Hagleitner
>              Labels: TODOC14
>             Fix For: 0.14.0
>         Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch,
> Tez can optionally sample data from a fraction of the tasks of a vertex and use that
information to choose the number of downstream tasks for any given scatter gather edge.
> Hive estimates the count of reducers by looking at stats and estimates for each operator
in the operator pipeline leading up to the reducer. However, if this estimate turns out to
be too large, Tez can reign in the resources used to compute the reducer.
> It does so by combining partitions of the upstream vertex. It cannot, however, add reducers
at this stage.
> I'm proposing to let users specify whether they want to use auto-parallelism or not.
If they do there will be scaling factors to determine max and min reducers Tez can choose
from. We will then partition by max reducers, letting Tez sample and reign in the count up
until the specified min.

This message was sent by Atlassian JIRA

View raw message