Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2F64DD6EF for ; Wed, 13 Mar 2013 06:02:20 +0000 (UTC) Received: (qmail 49728 invoked by uid 500); 13 Mar 2013 06:02:20 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 49637 invoked by uid 500); 13 Mar 2013 06:02:19 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 49601 invoked by uid 99); 13 Mar 2013 06:02:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2013 06:02:18 +0000 Date: Wed, 13 Mar 2013 06:02:18 +0000 (UTC) From: "chunhui shen (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-7403) Online Merge MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7403?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-7403: -------------------------------- Description:=20 Support executing region merge transaction on Regionserver, similar with sp= lit transaction Process of merging two regions: a.client send RPC(dispacth merging regions) to master b.master move the regions together (on the same regionserver) c.master send RPC(merge regions) to regionserver d.Regionserver execute the regions merge transaction in the thread pool e.the above b,c,d run asynchronously Process of region merge transaction: a.Construct a new region merge transaction. b.prepare for the merge transaction, the transaction will be canceled if it= is unavailable,=20 e.g. two regions don't belong to same table; two regions are not adjacent i= n a non-compulsory merge; region is closed or has reference c.execute the transaction as the following: /** * Set region as in transition, set it into MERGING state. */ SET_MERGING_IN_ZK, /** * We created the temporary merge data directory. */ CREATED_MERGE_DIR, /** * Closed the merging region A. */ CLOSED_REGION_A, /** * The merging region A has been taken out of the server's online regio= ns list. */ OFFLINED_REGION_A, /** * Closed the merging region B. */ CLOSED_REGION_B, /** * The merging region B has been taken out of the server's online regio= ns list. */ OFFLINED_REGION_B, /** * Started in on creation of the merged region. */ STARTED_MERGED_REGION_CREATION, /** * Point of no return. If we got here, then transaction is not recovera= ble * other than by crashing out the regionserver. */ PONR d.roll back if step c throws exception Usage: HBaseAdmin#mergeRegions was: The feature of this online merge: 1.Online,no necessary to disable table 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90 3.Easy to call merege request, no need to input a long region name, only en= coded name enough 4.No limit when operation, you don't need to tabke care the events like Ser= ver Dead, Balance, Split, Disabing/Enabing table, no need to take care whet= her you send a wrong merge request, it has alread done for you 5.Only little offline time for two merging regions Usage: 1.Tool: =20 bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show= ] 2.API: static void MergeManager#createMergeRequest We need merge in the following cases=EF=BC=9A 1.Region hole or region overlap, can=E2=80=99t be fix by hbck 2.Region become empty because of TTL and not reasonable Rowkey design 3.Region is always empty or very small because of presplit when create tabl= e 4.Too many empty or small regions would reduce the system performance(e.g. = mslab) Current merge tools only support offline and are not able to redo if except= ion is thrown in the process of merging, causing a dirty data For online system, we need a online merge. This implement logic of this patch for Online Merge is : For example, merge regionA and regionB into regionC 1.Offline the two regions A and B 2.Merge the two regions in the HDFS(Create regionC=E2=80=99s directory, mov= e regionA=E2=80=99s and regionB=E2=80=99s file to regionC=E2=80=99s directo= ry, delete regionA=E2=80=99s and regionB=E2=80=99s directory) 3.Add the merged regionC to .META. 4.Assign the merged regionC As design of this patch , once we do the merge work in the HDFS,we could re= do it until successful if it throws exception or abort or server restart, b= ut couldn=E2=80=99t be rolled back.=20 It depends on Use zookeeper to record the transaction journal state, make redo easier Use zookeeper to send/receive merge request Merge transaction is executed on the master Support calling merge request through API or shell tool About the merge process, please see the attachment and patch =20 > Online Merge > ------------ > > Key: HBASE-7403 > URL: https://issues.apache.org/jira/browse/HBASE-7403 > Project: HBase > Issue Type: New Feature > Affects Versions: 0.95.0 > Reporter: chunhui shen > Assignee: chunhui shen > Priority: Critical > Fix For: 0.95.0, 0.98.0 > > Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff,= 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv10.patch,= hbase-7403-trunkv11.patch, hbase-7403-trunkv12.patch, hbase-7403-trunkv13.= patch, hbase-7403-trunkv14.patch, hbase-7403-trunkv15.patch, hbase-7403-tru= nkv16.patch, hbase-7403-trunkv19.patch, hbase-7403-trunkv1.patch, hbase-740= 3-trunkv20.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase= -7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, me= rge region.pdf > > > Support executing region merge transaction on Regionserver, similar with = split transaction > Process of merging two regions: > a.client send RPC(dispacth merging regions) to master > b.master move the regions together (on the same regionserver) > c.master send RPC(merge regions) to regionserver > d.Regionserver execute the regions merge transaction in the thread pool > e.the above b,c,d run asynchronously > Process of region merge transaction: > a.Construct a new region merge transaction. > b.prepare for the merge transaction, the transaction will be canceled if = it is unavailable,=20 > e.g. two regions don't belong to same table; two regions are not adjacent= in a non-compulsory merge; region is closed or has reference > c.execute the transaction as the following: > /** > * Set region as in transition, set it into MERGING state. > */ > SET_MERGING_IN_ZK, > /** > * We created the temporary merge data directory. > */ > CREATED_MERGE_DIR, > /** > * Closed the merging region A. > */ > CLOSED_REGION_A, > /** > * The merging region A has been taken out of the server's online reg= ions list. > */ > OFFLINED_REGION_A, > /** > * Closed the merging region B. > */ > CLOSED_REGION_B, > /** > * The merging region B has been taken out of the server's online reg= ions list. > */ > OFFLINED_REGION_B, > /** > * Started in on creation of the merged region. > */ > STARTED_MERGED_REGION_CREATION, > /** > * Point of no return. If we got here, then transaction is not recove= rable > * other than by crashing out the regionserver. > */ > PONR > d.roll back if step c throws exception > Usage: > HBaseAdmin#mergeRegions -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira