Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 90381 invoked from network); 5 Aug 2008 17:45:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Aug 2008 17:45:07 -0000 Received: (qmail 24577 invoked by uid 500); 5 Aug 2008 17:45:04 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 24502 invoked by uid 500); 5 Aug 2008 17:45:04 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 24415 invoked by uid 99); 5 Aug 2008 17:45:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Aug 2008 10:45:04 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Aug 2008 17:44:17 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6AA08234C190 for ; Tue, 5 Aug 2008 10:44:44 -0700 (PDT) Message-ID: <1581326533.1217958284435.JavaMail.jira@brutus> Date: Tue, 5 Aug 2008 10:44:44 -0700 (PDT) From: "Jonathan Gray (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Updated: (HBASE-790) During import, single region blocks requests for >10 minutes, thread dumps, throws out pending requests, and continues In-Reply-To: <1599800705.1217551112023.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-790: -------------------------------- Status: Patch Available (was: Open) After spending quite a bit of time running different load tests on v2 patch, it appears the problem did still exist. Stack pointed me towards looking into the possibility of distributed state. I dug in and it seemed that state was being spread across two different locations. To fix this, I moved the flushRequested boolean into the WriteState class. This allowed me to make it so its state can only be changed when WriteState was synchronized on. Upon further testing it seems that this does work and I'm no longer seeing this issue. I am +1 on 790 with this patch, and +1 on rolling 0.2.0 RC2 > During import, single region blocks requests for >10 minutes, thread dumps, throws out pending requests, and continues > ---------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-790 > URL: https://issues.apache.org/jira/browse/HBASE-790 > Project: Hadoop HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.2.0 > Environment: 11 node cluster. 1 master w/ namenodes and hmaster. 10 slaves w/ datanodes and regionservers. All are 2GHz quad core xeons, 4gb ram, raid 0. > Reporter: Jonathan Gray > Assignee: Andrew Purtell > Priority: Blocker > Fix For: 0.2.0 > > Attachments: 790-v2.patch, 790-v3.patch, 790.patch, regionserver-lockup.log > > > During a batch import, I have two processes importing into a single region. > The behavior I saw was a regionserver with 2 regions of the table in question on it. The first region split, and the new regions were reassigned to another regionserver. > Following that, inserting into the region that was left over began to block client requests. I am attaching the regionserver log; below is the specific problem area: > 2008-07-31 15:38:24,190 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit in table locations for row <> and tableName .META.: location server 72.34.249.217:60020, location region name .META.,,1 > 2008-07-31 15:38:24,194 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: region split, META updated, and report to master all successful. Old region=REGION => {NAME => 'items,01beddd6-813b-4f2b-ac48-a0cef395cb7e,12175434512 > 2008-07-31 15:38:34,052 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region items,8001eb31-98bb-4087-bd8d-e4b42805addb,1217543451296: Memcache size 64.0m is >= than blocking > 2008-07-31 15:39:00,270 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 60020, call batchUpdate([B@17b4239f, row => 02c241b4-9d32-452d-8dab-247f4af693eb, {column => content:title, value => '...', column => content:content, va > org.apache.hadoop.hbase.NotServingRegionException: items,01beddd6-813b-4f2b-ac48-a0cef395cb7e,1217543451296 > at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:1436) > at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdate(HRegionServer.java:1147) > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:473) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) > 2008-07-31 15:39:09,547 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 8 on 60020' on region items,8001eb31-98bb-4087-bd8d-e4b42805addb,1217543451296: Memcache size 64.0m is >= than blocking > 2008-07-31 15:39:44,079 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region items,8001eb31-98bb-4087-bd8d-e4b42805addb,1217543451296: Memcache size 64.0m is >= than blocking > 2008-07-31 15:40:19,574 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region items,8001eb31-98bb-4087-bd8d-e4b42805addb,1217543451296: Memcache size 64.0m is >= than blocking > 2008-07-31 15:49:09,130 INFO org.apache.hadoop.hbase.regionserver.LogRoller: Rolling hlog. Number of entries: 1 > 2008-07-31 15:49:09,144 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Closing current log writer /hbase/log_72.34.249.212_1217535541159_60020/hlog.dat.1217543884691 > 2008-07-31 15:49:09,146 INFO org.apache.hadoop.hbase.regionserver.HLog: New log writer created at /hbase/log_72.34.249.212_1217535541159_60020/hlog.dat.1217544549145 > 2008-07-31 16:03:09,060 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region items,8001eb31-98bb-4087-bd8d-e4b42805addb,1217543451296. Current region memcache size 64.0m > 2008-07-31 16:03:09,467 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region items,8001eb31-98bb-4087-bd8d-e4b42805addb,1217543451296 'IPC Server handler 5 on 60020' > 2008-07-31 16:03:09,478 INFO org.apache.hadoop.ipc.Server: Process Thread Dump: Discarding call batchUpdate([B@4e727e0e, row => c08408b4-b68c-433c-ba3f-d46d3ba73288, {column => content:title, value => '...', column => content:content, v > As you can see there was a 14 minute delay between updates being blocked, and the unblocking occurring. > All the pending batchUpdates were thrown out (too old) and then importing proceeded normally. > The same behavior repeated itself later on a different regionserver, and again after a while it unfroze, kicked out pending updates, and continued. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.