Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 53A4B200D02 for ; Sat, 9 Sep 2017 02:39:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 525CF1609D7; Sat, 9 Sep 2017 00:39:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 993D71609A7 for ; Sat, 9 Sep 2017 02:39:09 +0200 (CEST) Received: (qmail 63453 invoked by uid 500); 9 Sep 2017 00:39:07 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 63442 invoked by uid 99); 9 Sep 2017 00:39:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Sep 2017 00:39:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C8D7D1A4985 for ; Sat, 9 Sep 2017 00:39:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ks0anccRvhsl for ; Sat, 9 Sep 2017 00:39:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D130161275 for ; Sat, 9 Sep 2017 00:39:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id ABA5DE0F12 for ; Sat, 9 Sep 2017 00:39:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id C693B24176 for ; Sat, 9 Sep 2017 00:39:00 +0000 (UTC) Date: Sat, 9 Sep 2017 00:39:00 +0000 (UTC) From: "xinxin fan (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18090) Improve TableSnapshotInputFormat to allow more multiple mappers per region MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 09 Sep 2017 00:39:10 -0000 [ https://issues.apache.org/jira/browse/HBASE-18090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159594#comment-16159594 ] xinxin fan commented on HBASE-18090: ------------------------------------ [[mailto:Mikhail Antonov]] Thanks for your review! {quote}Before I go in reviews..opening regions in read-only mode for snapshots seems reasonable. That change would only affect MR over snapshots codebase or some other paths too?{quote} I think the read-only regions only affect MR over snapshots codebase. {quote} if we set readonly flag we skip replaying WAL and don't create those tmp files. {quote} It seem that primary regions even opened in read only mode should replay the edits, just see HRegion.#initializeRegionInternals: {code:java} if (ServerRegionReplicaUtil.shouldReplayRecoveredEdits(this)) { // Recover any edits if available. maxSeqId = Math.max(maxSeqId, replayRecoveredEditsIfAny(this.fs.getRegionDir(), maxSeqIdInStores, reporter, status)); // Make sure mvcc is up to max. this.mvcc.advanceTo(maxSeqId); } {code} {quote}Will that work for snapshots created with skipFlush option? Is it always safe to skip WAL in that case?{quote} The MR just work on the snapshot store files, so i think it make no different if the region is read-only or not. How do you think? > Improve TableSnapshotInputFormat to allow more multiple mappers per region > -------------------------------------------------------------------------- > > Key: HBASE-18090 > URL: https://issues.apache.org/jira/browse/HBASE-18090 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Affects Versions: 1.4.0 > Reporter: Mikhail Antonov > Assignee: xinxin fan > Attachments: HBASE-18090-branch-1.3-v1.patch, HBASE-18090-branch-1.3-v2.patch > > > TableSnapshotInputFormat runs one map task per region in the table snapshot. This places unnecessary restriction that the region layout of the original table needs to take the processing resources available to MR job into consideration. Allowing to run multiple mappers per region (assuming reasonably even key distribution) would be useful. -- This message was sent by Atlassian JIRA (v6.4.14#64029)