hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xinxin fan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-18090) Improve TableSnapshotInputFormat to allow more multiple mappers per region
Date Sat, 09 Sep 2017 00:39:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159594#comment-16159594
] 

xinxin fan edited comment on HBASE-18090 at 9/9/17 12:38 AM:
-------------------------------------------------------------

Thanks for your review!

{quote}Before I go in reviews..opening regions in read-only mode for snapshots seems reasonable.
That change would only affect MR over snapshots codebase or some other paths too?{quote}
I think the read-only regions only affect MR over snapshots codebase.

{quote} if we set readonly flag we skip replaying WAL and don't create those tmp files. {quote}
It seem that primary regions even opened in read only mode should replay the edits, just see
HRegion.#initializeRegionInternals:

{code:java}
if (ServerRegionReplicaUtil.shouldReplayRecoveredEdits(this)) {
      // Recover any edits if available.
      maxSeqId = Math.max(maxSeqId,
        replayRecoveredEditsIfAny(this.fs.getRegionDir(), maxSeqIdInStores, reporter, status));
      // Make sure mvcc is up to max.
      this.mvcc.advanceTo(maxSeqId);
    }
{code}

{quote}Will that work for snapshots created with skipFlush option? Is it always safe to skip
WAL in that case?{quote}
The MR just work on the snapshot store files, so i think it make no different if the region
is read-only or not. How do you think?


was (Author: xinxin fan):
[[mailto:Mikhail Antonov]] Thanks for your review!

{quote}Before I go in reviews..opening regions in read-only mode for snapshots seems reasonable.
That change would only affect MR over snapshots codebase or some other paths too?{quote}
I think the read-only regions only affect MR over snapshots codebase.

{quote} if we set readonly flag we skip replaying WAL and don't create those tmp files. {quote}
It seem that primary regions even opened in read only mode should replay the edits, just see
HRegion.#initializeRegionInternals:

{code:java}
if (ServerRegionReplicaUtil.shouldReplayRecoveredEdits(this)) {
      // Recover any edits if available.
      maxSeqId = Math.max(maxSeqId,
        replayRecoveredEditsIfAny(this.fs.getRegionDir(), maxSeqIdInStores, reporter, status));
      // Make sure mvcc is up to max.
      this.mvcc.advanceTo(maxSeqId);
    }
{code}

{quote}Will that work for snapshots created with skipFlush option? Is it always safe to skip
WAL in that case?{quote}
The MR just work on the snapshot store files, so i think it make no different if the region
is read-only or not. How do you think?

> Improve TableSnapshotInputFormat to allow more multiple mappers per region
> --------------------------------------------------------------------------
>
>                 Key: HBASE-18090
>                 URL: https://issues.apache.org/jira/browse/HBASE-18090
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 1.4.0
>            Reporter: Mikhail Antonov
>            Assignee: xinxin fan
>         Attachments: HBASE-18090-branch-1.3-v1.patch, HBASE-18090-branch-1.3-v2.patch
>
>
> TableSnapshotInputFormat runs one map task per region in the table snapshot. This places
unnecessary restriction that the region layout of the original table needs to take the processing
resources available to MR job into consideration. Allowing to run multiple mappers per region
(assuming reasonably even key distribution) would be useful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message