hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucian Iordache <lucian.george.iorda...@gmail.com>
Subject Re: TableSplit on 2 regions that are on different region servers
Date Mon, 04 Jul 2011 15:36:52 GMT

I've understood that the regionLocation is used for the JobTracker to know
on what region server the task should be executed to have the best data
localization. So it should not be a problem in my case to use the location
of the region that has the more data on it.

So the problem is solved!


On Mon, Jul 4, 2011 at 5:18 PM, Lucian Iordache <
lucian.george.iordache@gmail.com> wrote:

> Hello guys,
> I have a problem with the table splits generation for a Map Reduce
> executing on HBase table. By default, the table splits are the regions,
> having a startRow, an endRow and a regionLocation.
> What happens if I want to create a split that contains a region plus some
> lines from the next one? (I have an user with information spanning 2
> regions, but I want to process all the rows in the order in which they are
> in HBase, that's why I want the rows of an user to be in the same split for
> map reduce).
> So, can I create a TableSplit like that? What happens if the 2 regions are
> on different region servers (the split has only a field regionLocation)?
> Best Regards,
> --
> Lucian

View raw message