Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 11378 invoked from network); 11 Feb 2011 01:06:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Feb 2011 01:06:04 -0000 Received: (qmail 90258 invoked by uid 500); 11 Feb 2011 01:06:04 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 90155 invoked by uid 500); 11 Feb 2011 01:06:03 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 90146 invoked by uid 99); 11 Feb 2011 01:06:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Feb 2011 01:06:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of todd@cloudera.com designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-iw0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Feb 2011 01:05:58 +0000 Received: by iwc10 with SMTP id 10so1934200iwc.14 for ; Thu, 10 Feb 2011 17:05:38 -0800 (PST) Received: by 10.231.35.131 with SMTP id p3mr23152052ibd.87.1297386338133; Thu, 10 Feb 2011 17:05:38 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.153.19 with HTTP; Thu, 10 Feb 2011 17:05:17 -0800 (PST) In-Reply-To: References: From: Todd Lipcon Date: Thu, 10 Feb 2011 17:05:17 -0800 Message-ID: Subject: Re: initial experience with HBase 0.90.1 rc0 To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=00032557a2bad2292f049bf74ba5 --00032557a2bad2292f049bf74ba5 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Feb 10, 2011 at 4:54 PM, Ted Yu wrote: > Thanks for the explanation. > Assuming the mixed class loading is static, why did this situation develop > after 40 minutes of heavy load :-( > You didn't hit global memstore pressure until 40 minutes of load. -Todd On Thu, Feb 10, 2011 at 4:42 PM, Ryan Rawson wrote: > > > It's a standard linking issue, you get one class from one version > > another from another, they are mostly compatible in terms of > > signatures (hence no exceptions) but are subtly incompatible in > > different ways. In the stack trace you posted, the handlers were > > blocked in: > > > > at > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:382) > > > > and the thread: > > > > "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00002aaabc21e000 > > nid=0x7717 waiting for monitor entry [0x0000000000000000] > > java.lang.Thread.State: BLOCKED (on object monitor) > > > > was idle. > > > > The cache flusher thread should be flushing, and yet it's doing > > nothing. This also happens to be one of the classes that were > > changed. > > > > > > > > On Thu, Feb 10, 2011 at 4:34 PM, Ted Yu wrote: > > > Can someone comment on my second question ? > > > Thanks > > > > > > On Thu, Feb 10, 2011 at 4:25 PM, Ryan Rawson > wrote: > > > > > >> As I suspected. > > >> > > >> It's a byproduct of our maven assembly process. The process could be > > >> fixed. I wouldn't mind. I don't support runtime checking of jars, > > >> there is such thing as too much tests, and this is an example of it. > > >> The check would then need a test, etc, etc. > > >> > > >> At SU we use new directories for each upgrade, copying the config > > >> over. With the lack of -default.xml this is easier than ever (just > > >> copy everything in conf/). With symlink switchover it makes roll > > >> forward/back as simple as doing a symlink switchover or back. I have > > >> to recommend this to everyone who doesnt have a management scheme. > > >> > > >> On Thu, Feb 10, 2011 at 4:20 PM, Ted Yu wrote: > > >> > hbase/hbase-0.90.1.jar leads lib/hbase-0.90.0.jar in the classpath. > > >> > I wonder > > >> > 1. why hbase jar is placed in two directories - 0.20.6 didn't use > such > > >> > structure > > >> > 2. what from lib/hbase-0.90.0.jar could be picked up and why there > > wasn't > > >> > exception in server log > > >> > > > >> > I think a JIRA should be filed for item 2 above - bail out when the > > two > > >> > hbase jars from $HBASE_HOME and $HBASE_HOME/lib are of different > > >> versions. > > >> > > > >> > Cheers > > >> > > > >> > On Thu, Feb 10, 2011 at 3:40 PM, Ryan Rawson > > wrote: > > >> > > > >> >> What do you get when you: > > >> >> > > >> >> ls lib/hbase* > > >> >> > > >> >> I'm going to guess there is hbase-0.90.0.jar there > > >> >> > > >> >> > > >> >> > > >> >> On Thu, Feb 10, 2011 at 3:25 PM, Ted Yu > wrote: > > >> >> > hbase-0.90.0-tests.jar and hbase-0.90.1.jar co-exist > > >> >> > Would this be a problem ? > > >> >> > > > >> >> > On Thu, Feb 10, 2011 at 3:16 PM, Ryan Rawson > > > >> wrote: > > >> >> > > > >> >> >> You don't have both the old and the new hbase jars in there do > > you? > > >> >> >> > > >> >> >> -ryan > > >> >> >> > > >> >> >> On Thu, Feb 10, 2011 at 3:12 PM, Ted Yu > > wrote: > > >> >> >> > .META. went offline during second flow attempt. > > >> >> >> > > > >> >> >> > The time out I mentioned happened for 1st and 3rd attempts. > > HBase > > >> was > > >> >> >> > restarted before the 1st and 3rd attempts. > > >> >> >> > > > >> >> >> > Here is jstack: > > >> >> >> > http://pastebin.com/EHMSvsRt > > >> >> >> > > > >> >> >> > On Thu, Feb 10, 2011 at 3:04 PM, Stack > > wrote: > > >> >> >> > > > >> >> >> >> So, .META. is not online? What happens if you use shell at > > this > > >> >> time. > > >> >> >> >> > > >> >> >> >> Your attachement did not come across Ted. Mind postbin'ing > it? > > >> >> >> >> > > >> >> >> >> St.Ack > > >> >> >> >> > > >> >> >> >> On Thu, Feb 10, 2011 at 2:41 PM, Ted Yu > > > >> wrote: > > >> >> >> >> > I replaced hbase jar with hbase-0.90.1.jar > > >> >> >> >> > I also upgraded client side jar to hbase-0.90.1.jar > > >> >> >> >> > > > >> >> >> >> > Our map tasks were running faster than before for about 50 > > >> minutes. > > >> >> >> >> However, > > >> >> >> >> > map tasks then timed out calling flushCommits(). This > > happened > > >> even > > >> >> >> after > > >> >> >> >> > fresh restart of hbase. > > >> >> >> >> > > > >> >> >> >> > I don't see any exception in region server logs. > > >> >> >> >> > > > >> >> >> >> > In master log, I found: > > >> >> >> >> > > > >> >> >> >> > 2011-02-10 18:24:15,286 DEBUG > > >> >> >> >> > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > > >> Opened > > >> >> >> region > > >> >> >> >> > -ROOT-,,0.70236052 on sjc1-hadoop6.X.com > ,60020,1297362251595 > > >> >> >> >> > 2011-02-10 18:24:15,349 INFO > > >> >> >> >> org.apache.hadoop.hbase.catalog.CatalogTracker: > > >> >> >> >> > Failed verification of .META.,,1 at address=null; > > >> >> >> >> > org.apache.hadoop.hbase.NotServingRegionException: > > >> >> >> >> > org.apache.hadoop.hbase.NotServingRegionException: Region > is > > not > > >> >> >> online: > > >> >> >> >> > .META.,,1 > > >> >> >> >> > 2011-02-10 18:24:15,350 DEBUG > > >> >> >> org.apache.hadoop.hbase.zookeeper.ZKAssign: > > >> >> >> >> > master:60000-0x12e10d0e31e0000 Creating (or updating) > > unassigned > > >> >> node > > >> >> >> for > > >> >> >> >> > 1028785192 with OFFLINE state > > >> >> >> >> > > > >> >> >> >> > I am attaching region server (which didn't respond to > > >> >> stop-hbase.sh) > > >> >> >> >> jstack. > > >> >> >> >> > > > >> >> >> >> > FYI > > >> >> >> >> > > > >> >> >> >> > On Thu, Feb 10, 2011 at 10:10 AM, Stack > > >> wrote: > > >> >> >> >> >> > > >> >> >> >> >> Thats probably enough Ted. The 0.90.1 hbase-default.xml > has > > an > > >> >> extra > > >> >> >> >> >> config. to enable the experimental HBASE-3455 feature but > > you > > >> can > > >> >> >> copy > > >> >> >> >> >> that over if you want to try playing with it (it defaults > > off > > >> so > > >> >> >> you'd > > >> >> >> >> >> copy over the config. if you wanted to set it to true). > > >> >> >> >> >> > > >> >> >> >> >> St.Ack > > >> >> >> >> > > > >> >> >> >> > > > >> >> >> >> > > >> >> >> > > > >> >> >> > > >> >> > > > >> >> > > >> > > > >> > > > > > > -- Todd Lipcon Software Engineer, Cloudera --00032557a2bad2292f049bf74ba5--