hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xinxin fan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-18090) Improve TableSnapshotInputFormat to allow more multiple mappers per region
Date Fri, 17 Nov 2017 02:12:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

xinxin fan updated HBASE-18090:
-------------------------------
    Release Note: 
In this task, we make it possible to run multiple mappers per region in the table snapshot.
With this feature, client can specify the desired num of mappers when init table snapshot
mapper job:
{code}
TableMapReduceUtil.initTableSnapshotMapperJob(
          snapshotName,                     // The name of the snapshot (of a table) to read
from
          scan,                                      // Scan instance to control CF and attribute
selection
          mapper,                                 // mapper
          outputKeyClass,                   // mapper output key 
          outputValueClass,                // mapper output value
          job,                                       // The current job to adjust
          true,                                     // upload HBase jars and jars for any
of the configured job classes via the distributed cache (tmpjars)
          restoreDir,                           // a temporary directory to copy the snapshot
files into
          splitAlgorithm,                     // splitAlgo algorithm to split, current split
algorithms only support RegionSplitter.UniformSplit() and RegionSplitter.HexStringSplit()
          n                                         // how many input splits to generate per
one region
);
{code}

  was:
In this task, we make it possible to run multiple mappers per region in the table snapshot.
With this feature, client can specify the desired num of mappers when init table snapshot
mapper job:

{code}
TableMapReduceUtil.initTableSnapshotMapperJob(
          snapshotName,                     // The name of the snapshot (of a table) to read
from
          scan,                                      // Scan instance to control CF and attribute
selection
          mapper,                                 // mapper
          outputKeyClass,                   // mapper output key 
          outputValueClass,                // mapper output value
          job,                                       // The current job to adjust
          true,                                     // upload HBase jars and jars for any
of the configured job classes via the distributed cache (tmpjars)
          restoreDir,                           // a temporary directory to copy the snapshot
files into
          splitAlgorithm,                     // splitAlgo algorithm to split, current split
algorithms only support RegionSplitter.UniformSplit() and RegionSplitter.HexStringSplit()
          n                                         // how many input splits to generate per
one region
);
{code}


> Improve TableSnapshotInputFormat to allow more multiple mappers per region
> --------------------------------------------------------------------------
>
>                 Key: HBASE-18090
>                 URL: https://issues.apache.org/jira/browse/HBASE-18090
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Mikhail Antonov
>            Assignee: xinxin fan
>             Fix For: 2.0.0-beta-1
>
>         Attachments: HBASE-18090-V3-master.patch, HBASE-18090-V4-master.patch, HBASE-18090-V5-master.patch,
HBASE-18090-branch-1-v2.patch, HBASE-18090-branch-1-v2.patch, HBASE-18090-branch-1.3-v1.patch,
HBASE-18090-branch-1.3-v2.patch, HBASE-18090.branch-1.patch
>
>
> TableSnapshotInputFormat runs one map task per region in the table snapshot. This places
unnecessary restriction that the region layout of the original table needs to take the processing
resources available to MR job into consideration. Allowing to run multiple mappers per region
(assuming reasonably even key distribution) would be useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message