hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: how to do parallel scanning in map reduce using hbase as input?
Date Tue, 22 Jul 2014 22:31:08 GMT
On Mon, Jul 21, 2014 at 11:11 PM, Li Li <fancyerii@gmail.com> wrote:

> On Tue, Jul 22, 2014 at 1:57 PM, Stack <stack@duboce.net> wrote:
> > On Mon, Jul 21, 2014 at 10:53 PM, Li Li <fancyerii@gmail.com> wrote:
> >
> >> Sorry, I enter tab and it send my unfinished post. See the following
> >> mail for answers of other questions.
> >>
> >> I forget the exception's detail. It throws exception in terminal.
> >
> >
> > What exception is thrown?
> I forget it. maybe I can retry it with 8 mapper configuration. it
> seems like out of memory exception
>


Who OOME'd?  The map task or hbase?



> >
> >
> >
> >> The
> >> default io.sort.mb is 100 and I set it to 500 to speed up reducer.
> >
> >
> > Do you have to have a reducer?  If you could skip the shuffle...
> I have 8 reducers
>


Do you have to reduce?

Would more reducers make your job run faster?



> >
> >
> >
> >> So
> >> I set mapred.child.java.opts to 1g
> >> The datanode/regionserver has 16GB memory but free memory
> >
> >
> > Does the RS use the 16G?
> the RS use 8G and there are datanode and tasktracker in this machine
> >
>


How much for DN and TT?  They don't need much usually.



> >
> >
> >> for
> >> map-reduce is about 5gb. So I can't add more mappers
> >>
> >>
> >> How much RAM in these machines?
> 16GB



These your machines or EC2?  Can you get bigger machines if EC2?

St.Ack

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message