hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: set number of map tasks for HBase MR
Date Sun, 11 Apr 2010 13:30:58 GMT
I noticed mapreduce.Export.createSubmittableJob() doesn't call setCaching()
in 0.20.3

Should call to setCaching() be added ?

Thanks

On Sun, Apr 11, 2010 at 2:14 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> A map against a HBase table by default cannot have more tasks than the
> number of regions in that table.
>
> Also you want to enable scanner caching. Pass a Scan object to the
> TableMapReduceUtil.initTableMapperJob that is configured with
> scan.setCaching(some_value) where the value should be the number of
> rows to fetch every time we hit a region server with next(). On rows
> of 100-200 bytes, our jobs usually are configured with 1000 up to
> 10000.
>
> Finally, is your job running in local mode or on a job tracker? Even
> if HBase uses HDFS, it usually doesn't know of the job tracker unless
> you configure HBase's classpath with Hadoop's conf.
>
> J-D
>
> On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko
> <crypto5@mail.saturnfans.com> wrote:
> > Hi,
> >
> > thanks for quick response. I tried to do following in the code:
> >
> > job.getConfiguration().setInt("mapred.map.tasks", 10000);
> >
> > but unfortunately have the same result.
> >
> > Any other ideas?
> >
> > --- amansk@gmail.com wrote:
> >
> > From: Amandeep Khurana <amansk@gmail.com>
> > To: hbase-user@hadoop.apache.org, crypto5@mail.saturnfans.com
> > Subject: Re: set number of map tasks for HBase MR
> > Date: Sat, 10 Apr 2010 20:04:18 -0700
> >
> > You can set the number of map tasks in your job config to a big number
> (eg:
> > 100000), and the library will automatically spawn one map task per
> region.
> >
> > -ak
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko <
> > crypto5@mail.saturnfans.com> wrote:
> >
> >> Hi guys,
> >>
> >> I have about 8G Hbase table  and I want to run MR job against it. It
> works
> >> extremely slow in my case. One thing I noticed is that job runs only 2
> map
> >> tasks. Is it any way to setup bigger number of map tasks? I sow some
> method
> >> in mapred package, but can't find anything like this in new mapreduce
> >> package.
> >>
> >> I run my MR job one a single machine in cluster mode.
> >>
> >>
> >> _____________________________________________________________
> >> Sign up for your free SaturnFans email account at
> >> http://webmail.saturnfans.com/
> >>
> >
> >
> >
> >
> > _____________________________________________________________
> > Sign up for your free SaturnFans email account at
> http://webmail.saturnfans.com/
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message