hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ophir Cohen <oph...@gmail.com>
Subject Re: TableInputFormat improvement to handle lots of small regions
Date Thu, 30 Jun 2011 15:38:30 GMT
Actually I thought of opposite version:
If I have a spare map slots why not configure it to run more than one mapper
on region?
The question then is how to 'skip' the mappers to the needed places inside
the regions.
Ophir

On Wed, Jun 22, 2011 at 15:08, Cosmin Lehene <clehene@adobe.com> wrote:

> We overridden getSplits so that it does super.getSplits and then using a
> configuration variable (splitsPerMap) will output another set of splits that
> basically merges (start/stop row manipulation) the original splits array.
>
> This can be easily modified to get the number of desired maps instead of
> regions per map (just a matter of taste here:))
>
> Cosmin
> On Jun 21, 2011, at 4:18 AM, Ma, Ming wrote:
>
> > TableInputFormat creates one split/mapper task per region. In the case of
> lots of small regions, the overhead of map reduce framework becomes
> overhead. There are some related work items that could address this issue.
> >
> >
> > 1. Reduce the number of small regions.
> https://issues.apache.org/jira/browse/HBASE-420
> >
> > 2. Improvement in map reduce framework to handle small jobs.
> https://issues.apache.org/jira/browse/MAPREDUCE-1220
> >
> > Another quick way to solve this is to just improve TableInputFormat so
> that it can pack a configurable number of regions from a given region server
> into one mapper task. I tested this approach and was able to achieve 40%
> improvement on map job latency.
> >
> > Any feedback?
> >
> > Ming
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message