hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graeme Wallace <graeme.wall...@farecompare.com>
Subject Re: MapReduce: Reducers partitions.
Date Wed, 10 Apr 2013 18:52:50 GMT
Whats the behavior then if you return hash % num_reducers and you have no
splits defined. When the reducer writes to the table does the region server
local to the reducer create a new region ?

Graeme


On Wed, Apr 10, 2013 at 1:26 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> So.
>
> I looked at the code, and I have one comment/suggestion here.
>
> If the table we are outputing to has regions, then partitions are build
> around that, and that's fine. But if the table is totally empty with a
> single region, even if we setNumReduceTasks to 2 or more, all the keys will
> go on the same first reducer because of this:
>     if (this.startKeys.length == 1){
>       return 0;
>     }
> I think it will be better to return something like keycrc%numPartitions
> instead. That still allow the application to spread the reducing process
> over multinode(racks) even if there is only one region in the table.
>
> In my usecase, I have millions of lines producing some statistics. At the
> end, I will have only about 600 lines, but it will take a lot of map and
> reduce time to go from millions to 600, that's why I'm looking to have more
> than one reducer. However, with only 600 lines, it's very difficult to
> pre-split the table. Keys are all very close.
>
> Does anyone see anything wrong with changing this default behaviour when
> startKeys.length == 1? If not, I will open a JIRA and upload a patch.
>
> JM
>
> 2013/4/10 Jean-Marc Spaggiari <jean-marc@spaggiari.org>
>
> > Thanks Ted.
> >
> > It's exactly where I was looking at now. I was close. I will take a
> deeper
> > look.
> >
> > Thanks Nitin for the link. I will read that too.
> >
> > JM
> >
> > 2013/4/10 Nitin Pawar <nitinpawar432@gmail.com>
> >
> >> To add what Ted said,
> >>
> >> the same discussion happened on the question Jean asked
> >>
> >> https://issues.apache.org/jira/browse/HBASE-1287
> >>
> >>
> >> On Wed, Apr 10, 2013 at 7:28 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>
> >> > Jean-Marc:
> >> > Take a look at HRegionPartitioner which is in both mapred and
> mapreduce
> >> > packages:
> >> >
> >> >  * This is used to partition the output keys into groups of keys.
> >> >
> >> >  * Keys are grouped according to the regions that currently exist
> >> >
> >> >  * so that each reducer fills a single region so load is distributed.
> >> >
> >> > Cheers
> >> >
> >> > On Wed, Apr 10, 2013 at 6:54 AM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> > > Hi Nitin,
> >> > >
> >> > > You got my question correctly.
> >> > >
> >> > > However, I'm wondering how it's working when it's done into HBase.
> Do
> >> > > we have defaults partionners so we have the same garantee that
> records
> >> > > mapping to one key go to the same reducer. Or do we have to
> implement
> >> > > this one our own.
> >> > >
> >> > > JM
> >> > >
> >> > > 2013/4/10 Nitin Pawar <nitinpawar432@gmail.com>:
> >> > > > I hope i understood what you are asking is this . If not then
> >> pardon me
> >> > > :)
> >> > > > from the hadoop developer handbook few lines
> >> > > >
> >> > > > The*Partitioner* class determines which partition a given (key,
> >> value)
> >> > > pair
> >> > > > will go to. The default partitioner computes a hash value for
the
> >> key
> >> > and
> >> > > > assigns the partition based on this result. It garantees that
all
> >> the
> >> > > > records mapping to one key go to same reducer
> >> > > >
> >> > > > You can write your custom partitioner as well
> >> > > > here is the link :
> >> > > >
> >> http://developer.yahoo.com/hadoop/tutorial/module5.html#partitioning
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Apr 10, 2013 at 6:19 PM, Jean-Marc Spaggiari <
> >> > > > jean-marc@spaggiari.org> wrote:
> >> > > >
> >> > > >> Hi,
> >> > > >>
> >> > > >> quick question. How are the data from the map tasks partitionned
> >> for
> >> > > >> the reducers?
> >> > > >>
> >> > > >> If there is 1 reducer, it's easy, but if there is more, are
all
> >> they
> >> > > >> same keys garanteed to end on the same reducer? Or not necessary?
> >>  If
> >> > > >> they are not, how can we provide a partionning function?
> >> > > >>
> >> > > >> Thanks,
> >> > > >>
> >> > > >> JM
> >> > > >>
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Nitin Pawar
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Nitin Pawar
> >>
> >
> >
>



-- 
Graeme Wallace
CTO
FareCompare.com
O: 972 588 1414
M: 214 681 9018

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message