Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2FF93640C for ; Sat, 28 May 2011 04:04:43 +0000 (UTC) Received: (qmail 94910 invoked by uid 500); 28 May 2011 04:04:42 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 94698 invoked by uid 500); 28 May 2011 04:04:42 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 94690 invoked by uid 99); 28 May 2011 04:04:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 May 2011 04:04:42 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 May 2011 04:04:35 +0000 Received: by qyk30 with SMTP id 30so1545214qyk.14 for ; Fri, 27 May 2011 21:04:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=XC0Gg4Qcz0GFmDKq0XD/ex5TBVTxwvYOaIN9vXaPi48=; b=M90XtEv/lzzwXWqlJud//rs10+buycCIa79eMwh5FDVg+HVPris4d/a9BtBnrCrHRJ g927KCpDp2YMyYrDoD1heZv6RvYgWjfbJuDAj36dPch+Am/eMu5LyYnpuMvLQNMrbHH3 1kQBFTKqMnILk3DVDHZGiU1vCHCOTelxLw9ng= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=FYuj2sn4C6Xboxo0u9c7UmImjV89Kc8coUq+/uZ98vI99Y7N8KRoX/OymiSiIAoLU3 5gBajGoLs9MVrjPizKYIe54Iat2N+0OLYVHhGLF15zwuUdui+svy4MRRz5DxQ61eyoWI 4u3jSPmoOeDKNwW1zXZvxyPPKydRS4i/XityE= MIME-Version: 1.0 Received: by 10.224.219.68 with SMTP id ht4mr2102325qab.217.1306555454211; Fri, 27 May 2011 21:04:14 -0700 (PDT) Sender: saint.ack@gmail.com Received: by 10.224.67.21 with HTTP; Fri, 27 May 2011 21:04:14 -0700 (PDT) In-Reply-To: References: Date: Fri, 27 May 2011 21:04:14 -0700 X-Google-Sender-Auth: -uKHpjWyvCc6ts9OYA11XBqQRK8 Message-ID: Subject: Re: Hmaster is NullPointerException From: Stack To: dev@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org The below is hard to read but I think I follow you Gao. Makes sense. Do you have a patch? (How did you manufacture the below? Was it by accident? Its a good one). St.Ack On Fri, May 27, 2011 at 7:04 PM, Gaojinchao wrote: > issue: > NullPointerException while hmaster starting. > =A0 =A0 =A0 java.lang.NullPointerException > =A0 =A0 =A0 =A0 at java.util.TreeMap.getEntry(TreeMap.java:324) > =A0 =A0 =A0 =A0 at java.util.TreeMap.get(TreeMap.java:255) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.AssignmentManager.addTo= Servers(AssignmentManager.java:1512) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.AssignmentManager.regio= nOnline(AssignmentManager.java:606) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.AssignmentManager.proce= ssFailover(AssignmentManager.java:214) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.HMaster.finishInitializ= ation(HMaster.java:402) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.jav= a:283) > > // below is my analysis > private void finishInitialization() > throws IOException, InterruptedException, KeeperException > { > > =A0 =A0/* > =A0 =A0 =A0 =A0In the function, waitForRegionServers return when some reg= ionservers checkin. > =A0 =A0 =A0 =A0The regionservers include some regions, but not root-regio= n and meta-region. > =A0 =A0 =A0 =A0(the Registered servers are 157-5-111-13, 157-5-111-14, 15= 7-5-111-12) > =A0 =A0*/ > =A0 =A0int regionCount =3D his.serverManager.waitForRegionServers(); > > =A0 =A0/* > =A0 =A0 =A0 =A0In the function, we can know verifyRootRegionLocation/veri= fyMetaRegionLocation are succeed with the log of assigned=3D0, > =A0 =A0 =A0 =A0and the root-regionserver and meta-regionserver are runnin= g. > =A0 =A0 =A0 =A0(The 157-5-111-11 server is running, but not Registered. a= nd server include root-region and meta-region.) > =A0 =A0*/ > =A0 =A0assignRootAndMeta(); > > =A0 =A0if (regionCount =3D=3D 0) { > =A0 =A0 =A0LOG.info("Master startup proceeding: cluster startup"); > =A0 =A0 =A0this.assignmentManager.cleanoutUnassigned(); > =A0 =A0 =A0this.assignmentManager.assignAllUserRegions(); > =A0 =A0} else { > =A0 =A0 =A0LOG.info("Master startup proceeding: master failover"); > > =A0 =A0 =A0/* > =A0 =A0 =A0 =A0HServerInfo hsi =3D this.serverManager.getHServerInfo(this= .catalogTracker.getMetaLocation()); > =A0 =A0 =A0 =A0In processFailover function, call getHServerInfo return NU= LL pointer, hsi is NULL pointer. > =A0 =A0 =A0 =A0Because the regionserver with meta-region has not Register= ed. > =A0 =A0 =A0*/ > =A0 =A0 =A0this.assignmentManager.processFailover(); > =A0 =A0} > } > > the exception logs: > > 2011-05-21 14:44:47,973 INFO org.apache.hadoop.hbase.master.ServerManager= : Waiting on regionserver(s) to checkin > 2011-05-21 14:44:49,473 INFO org.apache.hadoop.hbase.master.ServerManager= : Waiting on regionserver(s) to checkin > 2011-05-21 14:44:50,974 INFO org.apache.hadoop.hbase.master.ServerManager= : Waiting on regionserver(s) to checkin > 2011-05-21 14:44:51,281 INFO org.apache.hadoop.hbase.master.ServerManager= : Registering server=3D157-5-111-13,20020,1305877624933, regionCount=3D1543= , userLoad=3Dtrue > 2011-05-21 14:44:51,722 INFO org.apache.hadoop.hbase.master.ServerManager= : Registering server=3D157-5-111-14,20020,1305877627727, regionCount=3D1507= , userLoad=3Dtrue > 2011-05-21 14:44:51,774 INFO org.apache.hadoop.hbase.master.ServerManager= : Registering server=3D157-5-111-12,20020,1305877626108, regionCount=3D1521= , userLoad=3Dtrue > 2011-05-21 14:44:52,474 INFO org.apache.hadoop.hbase.master.ServerManager= : Waiting on regionserver(s) count to settle; currently=3D3 > 2011-05-21 14:44:53,974 INFO org.apache.hadoop.hbase.master.ServerManager= : Finished waiting for regionserver count to settle; count=3D3, sleptFor=3D= 18000 > 2011-05-21 14:44:53,974 INFO org.apache.hadoop.hbase.master.ServerManager= : Exiting wait on regionserver(s) to checkin; count=3D3, stopped=3Dfalse, c= ount of regions out on cluster=3D4571 > 2011-05-21 14:44:53,978 INFO org.apache.hadoop.hbase.master.MasterFileSys= tem: Log folder hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,130= 5875930161 doesn't belong to a known region server, splitting > 2011-05-21 14:44:53,991 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= gSplitter: Splitting 2 hlog(s) in hdfs://157.5.111.10:9000/hbase/.logs/157-= 5-111-11,20020,1305875930161 > 2011-05-21 14:44:53,992 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-0,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-2,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-3,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-1,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-6,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-5,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-4,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-8,5,main]: starting > 2011-05-21 14:44:53,993 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-7,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-9,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-11,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-10,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-12,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-13,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-14,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Writer thread Thread[WriterThread-15,5,main]: starting > 2011-05-21 14:44:53,994 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Splitting hlog 1 of 2: hdfs://157.5.111.10:9000/hbase/.logs/157= -5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305875949857, length=3D= 1503902 > 2011-05-21 14:44:53,996 INFO org.apache.hadoop.hbase.util.FSUtils: Recove= ring file hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,130587593= 0161/157-5-111-11%3A20020.1305875949857 > 2011-05-21 14:44:54,998 INFO org.apache.hadoop.hbase.util.FSUtils: Finish= ed lease recover attempt for hdfs://157.5.111.10:9000/hbase/.logs/157-5-111= -11,20020,1305875930161/157-5-111-11%3A20020.1305875949857 > 2011-05-21 14:44:55,140 INFO org.apache.hadoop.hbase.regionserver.wal.Seq= uenceFileLogWriter: Using syncFs -- HDFS-200 > 2011-05-21 14:44:55,141 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Creating writer path=3Dhdfs://157.5.111.10:9000/hbase/-ROOT-/70= 236052/recovered.edits/0000000000000021721 region=3D70236052 > 2011-05-21 14:44:55,177 INFO org.apache.hadoop.hbase.regionserver.wal.Seq= uenceFileLogWriter: Using syncFs -- HDFS-200 > 2011-05-21 14:44:55,177 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Creating writer path=3Dhdfs://157.5.111.10:9000/hbase/.META./10= 28785192/recovered.edits/0000000000000021722 region=3D1028785192 > 2011-05-21 14:44:55,435 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Pushed=3D5400 entries from hdfs://157.5.111.10:9000/hbase/.logs= /157-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305875949857 > 2011-05-21 14:44:55,436 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Splitting hlog 2 of 2: hdfs://157.5.111.10:9000/hbase/.logs/157= -5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305879556259, length=3D= 0 > 2011-05-21 14:44:55,436 INFO org.apache.hadoop.hbase.util.FSUtils: Recove= ring file hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,130587593= 0161/157-5-111-11%3A20020.1305879556259 > 2011-05-21 14:44:56,439 INFO org.apache.hadoop.hbase.util.FSUtils: Finish= ed lease recover attempt for hdfs://157.5.111.10:9000/hbase/.logs/157-5-111= -11,20020,1305875930161/157-5-111-11%3A20020.1305879556259 > 2011-05-21 14:44:56,439 WARN org.apache.hadoop.hbase.regionserver.wal.HLo= gSplitter: File hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-11,20020,130= 5875930161/157-5-111-11%3A20020.1305879556259 might be still open, length i= s 0 > 2011-05-21 14:44:56,451 DEBUG org.apache.hadoop.hbase.regionserver.wal.HL= ogSplitter: Pushed=3D0 entries from hdfs://157.5.111.10:9000/hbase/.logs/15= 7-5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305879556259 > 2011-05-21 14:44:56,489 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= gSplitter: Archived processed log hdfs://157.5.111.10:9000/hbase/.logs/157-= 5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305875949857 to hdfs://1= 57.5.111.10:9000/hbase/.oldlogs/157-5-111-11%3A20020.1305875949857 > 2011-05-21 14:44:56,506 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= gSplitter: Archived processed log hdfs://157.5.111.10:9000/hbase/.logs/157-= 5-111-11,20020,1305875930161/157-5-111-11%3A20020.1305879556259 to hdfs://1= 57.5.111.10:9000/hbase/.oldlogs/157-5-111-11%3A20020.1305879556259 > 2011-05-21 14:44:56,523 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= gSplitter: Waiting for split writer threads to finish > 2011-05-21 14:44:57,438 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= gSplitter: Split writers finished > 2011-05-21 14:44:57,455 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= gSplitter: Closed path hdfs://157.5.111.10:9000/hbase/.META./1028785192/rec= overed.edits/0000000000000021722 (wrote 5399 edits in 322ms) > 2011-05-21 14:44:57,505 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= gSplitter: Closed path hdfs://157.5.111.10:9000/hbase/-ROOT-/70236052/recov= ered.edits/0000000000000021721 (wrote 1 edits in 106ms) > 2011-05-21 14:44:57,505 INFO org.apache.hadoop.hbase.regionserver.wal.HLo= gSplitter: hlog file splitting completed in 3517 ms for hdfs://157.5.111.10= :9000/hbase/.logs/157-5-111-11,20020,1305875930161 > 2011-05-21 14:44:57,506 INFO org.apache.hadoop.hbase.master.MasterFileSys= tem: Log folder hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-12,20020,130= 5877626108 belongs to an existing region server > 2011-05-21 14:44:57,506 INFO org.apache.hadoop.hbase.master.MasterFileSys= tem: Log folder hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-13,20020,130= 5877624933 belongs to an existing region server > 2011-05-21 14:44:57,506 INFO org.apache.hadoop.hbase.master.MasterFileSys= tem: Log folder hdfs://157.5.111.10:9000/hbase/.logs/157-5-111-14,20020,130= 5877627727 belongs to an existing region server > 2011-05-21 14:44:57,524 INFO org.apache.hadoop.hbase.master.HMaster: -ROO= T- assigned=3D0, rit=3Dfalse, location=3D157-5-111-11:20020 > 2011-05-21 14:44:57,528 INFO org.apache.hadoop.hbase.master.HMaster: .MET= A. assigned=3D0, rit=3Dfalse, location=3D157-5-111-11:20020 > 2011-05-21 14:44:57,528 INFO org.apache.hadoop.hbase.master.HMaster: Mast= er startup proceeding: master failover > 2011-05-21 14:44:57,530 FATAL org.apache.hadoop.hbase.master.HMaster: Unh= andled exception. Starting shutdown. > java.lang.NullPointerException > =A0 =A0 =A0 =A0 at java.util.TreeMap.getEntry(TreeMap.java:324) > =A0 =A0 =A0 =A0 at java.util.TreeMap.get(TreeMap.java:255) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.AssignmentManager.addTo= Servers(AssignmentManager.java:1512) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.AssignmentManager.regio= nOnline(AssignmentManager.java:606) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.AssignmentManager.proce= ssFailover(AssignmentManager.java:214) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.HMaster.finishInitializ= ation(HMaster.java:402) > =A0 =A0 =A0 =A0 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.jav= a:283) > 2011-05-21 14:44:57,531 INFO org.apache.hadoop.hbase.master.HMaster: Abor= ting > 2011-05-21 14:44:57,531 DEBUG org.apache.hadoop.hbase.master.HMaster: Sto= pping service threads >