hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Antonov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18090) Improve TableSnapshotInputFormat to allow more multiple mappers per region
Date Thu, 25 May 2017 20:44:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025363#comment-16025363

Mikhail Antonov commented on HBASE-18090:

Oh, thanks for reference! I didn't see this one. I don't see patches there, so might be this
one would do some good.

any feedback on the patch? I think assuming reasonably even key distribution across regions,
giving just number of splits per region and split algo should suffice. Simpler and cheaper
then trying to compute actual distribution based on data in HFiles.

Still need to address feedback from [~tedyu], as well as some rough edges around how we create
recovered.edits files during openRegion sequence.

cc [~enis] [~esteban] ?

> Improve TableSnapshotInputFormat to allow more multiple mappers per region
> --------------------------------------------------------------------------
>                 Key: HBASE-18090
>                 URL: https://issues.apache.org/jira/browse/HBASE-18090
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 1.4.0
>            Reporter: Mikhail Antonov
>         Attachments: HBASE-18090-branch-1.3-v1.patch
> TableSnapshotInputFormat runs one map task per region in the table snapshot. This places
unnecessary restriction that the region layout of the original table needs to take the processing
resources available to MR job into consideration. Allowing to run multiple mappers per region
(assuming reasonably even key distribution) would be useful.

This message was sent by Atlassian JIRA

View raw message