Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Tue, 24 Jun 2014 06:02:24 +0000 (UTC)
From: "wangmeng (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12723278.1403587411113.34822.1403589744654@arcas>
In-Reply-To: <JIRA.12723278.1403587411113@arcas>
References: <JIRA.12723278.1403587411113@arcas>
Subject: [jira] [Commented] (HIVE-7277) how to decide reduce numbers
   according  to  the input size of reduce stage rather than the  input size
 of  map stage?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HIVE-7277?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14041=
741#comment-14041741 ]=20

wangmeng commented on HIVE-7277:
--------------------------------

As  I  know ,TEZ is a new  compute engine  different from mapreduce,   is t=
here  any  solution based on map reduce engine  ?

> how to decide reduce numbers   according  to  the input size of reduce st=
age rather than the  input size of  map stage?
> -------------------------------------------------------------------------=
----------------------------------------------
>
>                 Key: HIVE-7277
>                 URL: https://issues.apache.org/jira/browse/HIVE-7277
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: wangmeng
>             Fix For: 0.13.0
>
>
> As we  know ,now  hive decide the  reduce numbers  just by  the " Input s=
ize of   map/ hive.exec.reducers.bytes.per.reducer(default 1G ).....
> But ,I  think  the out put size of map stage  may have a big difference f=
rom  the original  input size , so I  think  this  strategy to decide reduc=
e-numbers may be improper....
> So is   there any feature  which can decide the reduce number just  accor=
ding to the out put  of the map stage.?    thanks . =20
>  As  I know , actually ,the reduce stage will begin just  after some map =
tasks have finished rather than until  the  whole map stage have finished ,=
 so I  think  it is improper too  decide reduce numbers   when  the  whole =
map stage  have finished.
> As  someone point ,We can just according to  the out put size of the  ear=
liest map tasks which have finished   to  estimate the whole reduce numbers=
......However,   in fact ,now Hive has used filter push down(where) ,which =
may  resulting a big  difference from each map task .
> So=EF=BC=8C  this  estimation  is improper.
> thanks .


--
This message was sent by Atlassian JIRA
(v6.2#6252)