Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 58792 invoked from network); 3 Aug 2010 21:37:04 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Aug 2010 21:37:04 -0000 Received: (qmail 92085 invoked by uid 500); 3 Aug 2010 21:37:04 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 91962 invoked by uid 500); 3 Aug 2010 21:37:03 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Delivered-To: moderator for dev@hbase.apache.org Received: (qmail 86929 invoked by uid 99); 3 Aug 2010 21:35:36 -0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=FH_HELO_EQ_D_D_D_D,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: Review Request: HBASE-2697 The master rewrite From: "Jean-Daniel Cryans" To: "Karthik Ranganathan" , stack@duboce.net Date: Tue, 03 Aug 2010 21:35:07 -0000 Message-ID: <20100803213507.20605.33704@ip-10-202-7-187.ec2.internal> Cc: dev@hbase.apache.org, jiraposter@review.hbase.org, "Jonathan Gray" , "Jean-Daniel Cryans" In-Reply-To: <20100803120249.20604.26195@ip-10-202-7-187.ec2.internal> References: <20100803120249.20604.26195@ip-10-202-7-187.ec2.internal> X-Virus-Checked: Checked by ClamAV on apache.org ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/484/#review636 ----------------------------------------------------------- First pass, still trying to understand what's going on branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= CatalogTracker.java unused? branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaEditor.java Possible NPE branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaEditor.java Possible NPE, server can be null branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaEditor.java Possible NPE branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaEditor.java Possible NPE branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaReader.java Possible NPE branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaReader.java Possible NPE, metaServer can be null branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaReader.java Possible NPE branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaReader.java javadoc for this method and the next one branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaReader.java This method and a few other ones in this class looks almost the same, I= think there's a refactoring potential branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaReader.java Possible NPE branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaReader.java Somehow this line looks weird to me, not sure if it's how you build the= row or that you don't use a final for ",," branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaReader.java NPE branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/= MetaReader.java NPE branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/executor= /EventHandler.java could enclose in a ifDebugEnabled branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/A= ssignmentManager.java Javadoc looks anemic compared to the importance of that class branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/A= ssignmentManager.java Could you handle if regionState is null at this level? branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/A= ssignmentManager.java Is this code coming back later? branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/A= ssignmentManager.java Looks fishy, what happens if anyone uses that state outside of that met= hod? branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/A= ssignmentManager.java Any investigation needed? branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/A= ssignmentManager.java What happens if it never returns? Also I wonder if it's feasible that t= he assignment somehow ends up waiting on itself because of region server fa= ilure branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regionse= rver/handler/CloseRegionHandler.java I see red boxes :) branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regionse= rver/handler/OpenRegionHandler.java red - Jean-Daniel On 2010-08-03 05:02:49, Jonathan Gray wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.cloudera.org/r/484/ > ----------------------------------------------------------- > = > (Updated 2010-08-03 05:02:49) > = > = > Review request for hbase, stack and Karthik Ranganathan. > = > = > Summary > ------- > = > This is basically the master rewrite. It's massive, I apologize. I can = start a cluster, create a table, insert a row, and get the row. Restarting= should work too. > = > I completely ripped out BaseScanner/MetaScanner as well as the RegionServ= erOperationQueue stuff. I'm sold on the former but there's a bit of work t= o fully replace the latter though I think it's a desirable direction to get= rid of it. The RSOQ stuff has been replaced with EventHandlers. The Base= Scanner is no longer necessary because we are not utilizing META to determi= ne when things are unassigned. > = > All the table operations (enable/disable, create/delete, addcolumn/del/mo= dify) have been reimplemented in handlers. > = > All META/ROOT server access is done through the new CatalogManager. This= uses ZK and ROOT/META to keep active track of the current locations of the= catalog regions. A bit more work needs to be done on defining our retry s= emantics for different operations. > = > The actual read/write operations on ROOT/META are done with new classes R= ootEditor, MetaEditor, and MetaReader. A lot of this code was scattered ar= ound HMaster, RegionManager, ServerManager, HRegion, etc... > = > Some new methods in FileSystemManager, collected from being scattered aro= und. > = > This also does final straw to old ZKWrapper and removes it. All ZK stuff= is going through the new stuff. > = > Open/close is now going via RPC and ZK, no longer piggybacking any of the= se messages on heartbeats (and with splits not working right now, even that= is gone, so there will be nothing it seems) > = > RegionManager was completely removed and replaced with a shiny, brand new= AssignmentManager. It's already managed to get a bit hectic but is pretty= straightforward. It is the heart of zk-based region assignment and most l= ogic is in here. It contains regionsInTransition, the region plans, etc... > = > I added a new /disabled node in ZK to keep track of tables that are disab= led. I think we need this to be able to reliably do disables across failur= es. > = > Among other changes.... > = > = > Here's my rough list of things left to do (covers most but not all TODOs = in the branch) > = > = > * make final decisions on root/meta timeouts. almost everyone is coordin= ating > access through CatalogTracker which should make it easy to standardize. > if there are operations that should just retry indefinitely, they need = to > resubmit themselves to their executor service. > = > * move splits to RS side, integrate new patch from stack on trunk > might need a new CREATED unassigned now, or new rpc, but get rid of sen= ding > split notification on heartbeat? > how to handle splits concurrent with disable? > = > * review master startup order > = > * figure what to do with client table admin ops (flush, split, compact) > = > * MasterAddressManager move to extending ZooKeeperNodeTracker > = > * on region open (and wherever split children notify master) should check= if > if the table is disabled and should close the regions... maybe. > = > * regionserver exit needs to be reimplemented in servermanager > also rs expiration > = > * add priorities and pool size to handlers (bring in from flush patch) > = > * in RootEditor there is a race condition between delete and watch? > = > * migrate TestZKBased* tests to use new handlers > = > * make sync calls for enable/disable (check and verify methods?) > = > * integrate load balancing > = > * finish TODOs on new failover path and remove old code in joinExistingCl= uster() > = > * finish TODOs on timeout monitor in assignmentmanager > = > * review filesystemmanager calls > = > * figure how to handle the very rare but possible race condition where two > RSs will update META and the later one can squash the valid one if ther= e was > a long gc pause > = > * synchronize all access to the boolean in ActiveMasterManager > (now this is probably just move it to extend ZKNodeTracker) > = > * there are some races with master wanting to connect for rpc > to regionserver and the rs starting its rpc server, need to address > = > * migrate TestMasterTransitions or make new? > = > * fix or remove last couple master tests that used RSOQ > = > * write new tests!!! > = > = > This addresses bug HBASE-2697. > http://issues.apache.org/jira/browse/HBASE-2697 > = > = > Diffs > ----- > = > branches/0.90_master_rewrite/BRANCH_TODO.txt 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/Abor= table.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/Clus= terStatus.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/HReg= ionInfo.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/HSer= verInfo.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/Inva= lidFamilyException.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/Serv= erController.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/cata= log/CatalogTracker.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/cata= log/MetaEditor.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/cata= log/MetaReader.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/cata= log/RootEditor.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/clie= nt/HConnectionManager.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/clie= nt/MetaScanner.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/exec= utor/EventHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/exec= utor/HBaseEventHandler.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/exec= utor/HBaseExecutorService.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/exec= utor/RegionTransitionData.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/ipc/= HRegionInterface.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ActiveMasterManager.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/AddColumn.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/AssignmentManager.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/BaseScanner.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ChangeTableState.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ColumnOperation.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/DeleteColumn.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/FileSystemManager.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/HMaster.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/LoadBalancer.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/MasterController.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/MasterStatus.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/MetaRegion.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/MetaScanner.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ModifyColumn.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ModifyTableMeta.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ProcessRegionClose.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ProcessRegionOpen.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ProcessRegionStatusChange.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ProcessServerShutdown.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/RegionManager.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/RegionServerOperation.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/RegionServerOperationListener.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/RegionServerOperationQueue.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/RetryableMetaOperation.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/RootScanner.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/ServerManager.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/TableDelete.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/TableOperation.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/ClosedRegionHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/DeleteTableHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/DisableTableHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/EnableTableHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/MasterCloseRegionHandler.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/MasterOpenRegionHandler.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/ModifyTableHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/OpenedRegionHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/ServerShutdownHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/TableAddFamilyHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/TableDeleteFamilyHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/TableEventHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/mast= er/handler/TableModifyFamilyHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regi= onserver/CompactSplitThread.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regi= onserver/HRegion.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regi= onserver/HRegionServer.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regi= onserver/MasterAddressManager.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regi= onserver/RSZookeeperUpdater.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regi= onserver/RegionServerController.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regi= onserver/handler/CloseRegionHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regi= onserver/handler/OpenRegionHandler.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zook= eeper/MetaNodeTracker.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zook= eeper/RootRegionTracker.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zook= eeper/ZKAssign.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zook= eeper/ZKTableDisable.java PRE-CREATION = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zook= eeper/ZKUtil.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zook= eeper/ZooKeeperListener.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zook= eeper/ZooKeeperNodeTracker.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zook= eeper/ZooKeeperWatcher.java 979909 = > branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zook= eeper/ZooKeeperWrapper.java 979909 = > branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/HBas= eTestingUtility.java 979909 = > branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/mast= er/TestActiveMasterManager.java 979909 = > branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/mast= er/TestMaster.java 979909 = > branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/mast= er/TestRestartCluster.java 979909 = > = > Diff: http://review.cloudera.org/r/484/diff > = > = > Testing > ------- > = > simple cluster tests passing, need to run (and write) more tests > = > = > Thanks, > = > Jonathan > = >