hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7277) how to decide reduce numbers according to the input size of reduce stage rather than the input size of map stage?
Date Tue, 24 Jun 2014 05:52:25 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041733#comment-14041733
] 

Gopal V commented on HIVE-7277:
-------------------------------

Yes, this is how HIVE-7158 works.

The reducer counts are estimated at runtime from counters from the map-phase.

> how to decide reduce numbers   according  to  the input size of reduce stage rather than
the  input size of  map stage?
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7277
>                 URL: https://issues.apache.org/jira/browse/HIVE-7277
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: wangmeng
>             Fix For: 0.13.0
>
>
> As we  know ,now  hive decide the  reduce numbers  just by  the " Input size of   map/
hive.exec.reducers.bytes.per.reducer(default 1G ).....
> But ,I  think  the out put size of map stage  may have a big difference from  the original
 input size , so I  think  this  strategy to decide reduce-numbers may be improper....
> So is   there any feature  which can decide the reduce number just  according to the
out put  of the map stage.?    thanks .  
>  As  I know , actually ,the reduce stage will begin just  after some map tasks have finished
rather than until  the  whole map stage have finished , so I  think  it is improper too  decide
reduce numbers   when  the  whole map stage  have finished.
> As  someone point ,We can just according to  the out put size of the  earliest map tasks
which have finished   to  estimate the whole reduce numbers......However,   in fact ,now Hive
has used filter push down(where) ,which may  resulting a big  difference from each map task
.
> So,  this  estimation  is improper.
> thanks .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message