hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shrijeet Paliwal <shrij...@rocketfuel.com>
Subject Re: Potential bug around hive merging of small files
Date Tue, 13 Mar 2012 22:44:59 GMT
I have opened https://issues.apache.org/jira/browse/HIVE-2869

On Tue, Mar 13, 2012 at 8:37 AM, Ashutosh Chauhan <hashutosh@apache.org>wrote:

> This does look like a bug. Shrijeet, mind opening a jira and attaching your
> patch there.
>
> Thanks,
> Ashutosh
> On Mon, Mar 12, 2012 at 16:29, Shrijeet Paliwal <shrijeet@rocketfuel.com
> >wrote:
>
> > I had a type in last email. Settings are as follows
> >
> > hive> set mapred.min.split.size.per.node=1000000000;
> > hive> set mapred.min.split.size.per.rack=1000000000;
> > hive> set mapred.max.split.size=1000000000;
> > hive> set hive.merge.size.per.task=1000000000;
> > hive> set hive.merge.smallfiles.avgsize=1000000000;
> > hive> set hive.merge.size.smallfiles.avgsize=1000000000;*hive> set
> > hive.merge.mapfiles=true;*hive> set hive.merge.mapredfiles=true;
> >
> > *hive> set hive.mergejob.maponly=false;*
> >
> >
> >
> >
> > On Mon, Mar 12, 2012 at 4:27 PM, Shrijeet Paliwal
> > <shrijeet@rocketfuel.com>wrote:
> >
> > > Hive Version: Hive 0.8 (last commit SHA
> > >  b581a6192b8d4c544092679d05f45b2e50d42b45 )
> > >
> > > Hadoop version : chd3u0
> > >
> > > I am trying to use the hive merge small file feature by setting all the
> > > necessary params.
> > > I am disabling use of CombineHiveInputFormat since my input is
> compressed
> > > text.
> > >
> > > hive> set mapred.min.split.size.per.node=1000000000;
> > > hive> set mapred.min.split.size.per.rack=1000000000;
> > > hive> set mapred.max.split.size=1000000000;
> > > hive> set hive.merge.size.per.task=1000000000;
> > > hive> set hive.merge.smallfiles.avgsize=1000000000;
> > > hive> set hive.merge.size.smallfiles.avgsize=1000000000;
> > > hive> set hive.merge.mapfiles=false;
> > > hive> set hive.merge.mapredfiles=true;
> > >
> > >
> > > The plan decides to launch two MR jobs but after first job succeeds I
> get
> > > runt time error
> > >
> > > "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but
> > > reduce operator specified"
> > >
> > > I think the problem can be fixed by using this patch I came with :
> > > https://gist.github.com/2025303
> > >
> > > Of course my understanding and hence this patch can be totally wrong.
> > > Please provide feedback.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message