hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Marc Spaggiari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11562) CopyTable should provide an option to shuffle the mapper tasks
Date Tue, 22 Jul 2014 14:32:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070314#comment-14070314

Jean-Marc Spaggiari commented on HBASE-11562:

This sounds like a useful detail. Make it the default behavior?
Thie wil change the default behaviour.  That's why I prefered to keep that false by default
to keep current behaviour but allow it. Maybe we can turn that on by default in 0.99 and false
by default on the others?

In practice, shouldn't a well balanced table have fairly random region -> RegionServer
Yes it should. On a 100 nodes clusters for a 3000 regions table, when you pickup the first
300 regions you mot probably will have about 3 regions for each server. So all the 100 servers
will send a lot of puts. Now, if destination is a 10 regions table, all those 300 sources
will most probably go into the same region, so the same region server. So issue is there more
than on the source table distribution.

You see logically adjacent regions piling up on the same RS?

 Not necessary, but since we balance per cluster and not per table, you still have the odds
to get 2 regions from the 10 first ending to the same region server, which will make things
even worst.

I now need to figure why Jenkins don't like my patchs. Will re-submit later today.

> CopyTable should provide an option to shuffle the mapper tasks
> --------------------------------------------------------------
>                 Key: HBASE-11562
>                 URL: https://issues.apache.org/jira/browse/HBASE-11562
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.99.0, 0.94.20, 0.98.4
>            Reporter: Jean-Marc Spaggiari
>            Assignee: Jean-Marc Spaggiari
>         Attachments: HBASE-11562-v0-trunk.patch, HBASE-11562-v1-trunk.patch
> When doing a copy table from a table with a lot of regions to a table to way less regions,
on a cluster with limited number of mappers, since map tasks are ordered by key, tasks will
first run for the few first regions and will hotspot a single region server on the destination
> To avoid this, we should submit the map tasks in a random order.
> This JIRA is to add this option to CopyTable and TableInputFormat

This message was sent by Atlassian JIRA

View raw message