Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 90576 invoked from network); 23 Jan 2008 23:24:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Jan 2008 23:24:56 -0000 Received: (qmail 14945 invoked by uid 500); 23 Jan 2008 23:24:45 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 14915 invoked by uid 500); 23 Jan 2008 23:24:45 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 14906 invoked by uid 500); 23 Jan 2008 23:24:45 -0000 Delivered-To: apmail-lucene-hadoop-dev@lucene.apache.org Received: (qmail 14903 invoked by uid 99); 23 Jan 2008 23:24:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jan 2008 15:24:45 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jan 2008 23:24:28 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7EB64714159 for ; Wed, 23 Jan 2008 15:24:34 -0800 (PST) Message-ID: <6735414.1201130674506.JavaMail.jira@brutus> Date: Wed, 23 Jan 2008 15:24:34 -0800 (PST) From: "stack (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Resolved: (HADOOP-2682) regionserver wedged: thread holds lock during blocking read (?) In-Reply-To: <821191.1201044876261.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HADOOP-2682. --------------------------- Resolution: Invalid Fix Version/s: 0.16.0 Chatting w/ Michael, turns out a bunch of datanodes were stuck in swap-hell. Speculation therefore is that the reads were actually doing something, just super slowly. Meantime threads were blocked because of the synchronized getReader (fix is in patch for HADOOP-2692). We agreed to resolve this issue Invalid. > regionserver wedged: thread holds lock during blocking read (?) > --------------------------------------------------------------- > > Key: HADOOP-2682 > URL: https://issues.apache.org/jira/browse/HADOOP-2682 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/hbase > Affects Versions: 0.15.0 > Reporter: Michael Bieniosek > Fix For: 0.16.0 > > > Using a patched version of 0.15.0, my regionserver freezes up, saying repeatedly: > 2008-01-22 23:10:12,617 WARN org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60020, call getProtocolVersion(org. > apache.hadoop.hbase.HRegionInterface, 1) from xxx.xxx.xxx.178:55116: discarded for being too old (46464) > Then I get: > 2008-01-22 23:11:12,644 WARN org.apache.hadoop.ipc.Server: IPC Server handler 7 on 60020, call openScanner(-ROOT-,,0, [Lorg.apache.hadoop.io.Text;@19958bf9, .META., 1201043326058, null) from xxx.xxx.45.192:44993: output error > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) > at sun.nio.ch.SocketChannelImpl.write(Unknown Source) > at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108) > at org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89) > at java.io.BufferedOutputStream.flushBuffer(Unknown Source) > at java.io.BufferedOutputStream.flush(Unknown Source) > at java.io.DataOutputStream.flush(Unknown Source) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:615) > 2008-01-22 23:11:12,660 INFO org.apache.hadoop.ipc.Server: Process Thread Dump: Discarding call getProtocolVersion(org.apache.hadoop.hbase.HRegionInterface, 1) from xxx.xxx.45.224:34506 > 29 active threads > Thread 4329 (IPC Client connection to xxx.xxx.44.135:10000): > State: WAITING > Blocked count: 1 > Waited count: 4 > Waiting on org.apache.hadoop.ipc.Client$Connection@21e1962d > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Unknown Source) > org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:216) > org.apache.hadoop.ipc.Client$Connection.run(Client.java:255) > Thread 37 (IPC Server handler 9 on 60020): > State: BLOCKED > Blocked count: 1 > Waited count: 202 > Blocked on org.apache.hadoop.hbase.HStoreFile@72e8a021 > Blocked by 29 (IPC Server handler 1 on 60020) > Stack: > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) > org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:2057) > org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2203) > org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) > org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1699) > org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > java.lang.reflect.Method.invoke(Unknown Source) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > Thread 36 (IPC Server handler 8 on 60020): > State: BLOCKED > Blocked count: 1 > Waited count: 201 > Blocked on org.apache.hadoop.hbase.HStoreFile@72e8a021 > Blocked by 29 (IPC Server handler 1 on 60020) > Stack: > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) > org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:2057) > org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2203) > org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) > org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1699) > org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > java.lang.reflect.Method.invoke(Unknown Source) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > Thread 35 (IPC Server handler 7 on 60020): > State: RUNNABLE > Blocked count: 4 > Waited count: 240 > Stack: > sun.management.ThreadImpl.getThreadInfo0(Native Method) > sun.management.ThreadImpl.getThreadInfo(Unknown Source) > sun.management.ThreadImpl.getThreadInfo(Unknown Source) > org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:114) > org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:162) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:579) > Thread 34 (IPC Server handler 6 on 60020): > State: BLOCKED > Blocked count: 1 > Waited count: 203 > Blocked on org.apache.hadoop.hbase.HStoreFile@72e8a021 > Blocked by 29 (IPC Server handler 1 on 60020) > Stack: > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) > org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:2057) > org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2203) > org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) > org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1699) > org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > java.lang.reflect.Method.invoke(Unknown Source) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > Thread 33 (IPC Server handler 5 on 60020): > State: BLOCKED > Blocked count: 1 > Waited count: 212 > Blocked on org.apache.hadoop.hbase.HStoreFile@72e8a021 > Blocked by 29 (IPC Server handler 1 on 60020) > Stack: > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) > org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:2057) > org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2203) > org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) > org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1699) > org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > java.lang.reflect.Method.invoke(Unknown Source) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > Thread 32 (IPC Server handler 4 on 60020): > State: BLOCKED > Blocked count: 1 > Waited count: 202 > Blocked on org.apache.hadoop.hbase.HStoreFile@72e8a021 > Blocked by 29 (IPC Server handler 1 on 60020) > Stack: > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) > org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:2057) > org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2203) > org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) > org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1699) > org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > java.lang.reflect.Method.invoke(Unknown Source) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > Thread 31 (IPC Server handler 3 on 60020): > State: BLOCKED > Blocked count: 1 > Waited count: 210 > Blocked on org.apache.hadoop.hbase.HStoreFile@72e8a021 > Blocked by 29 (IPC Server handler 1 on 60020) > Stack: > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) > org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:2057) > org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2203) > org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) > org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1699) > org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > java.lang.reflect.Method.invoke(Unknown Source) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > Thread 30 (IPC Server handler 2 on 60020): > State: BLOCKED > Blocked count: 1 > Waited count: 216 > Blocked on org.apache.hadoop.hbase.HStoreFile@72e8a021 > Blocked by 29 (IPC Server handler 1 on 60020) > Stack: > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) > org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:2057) > org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2203) > org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) > org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1699) > org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > java.lang.reflect.Method.invoke(Unknown Source) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > Thread 29 (IPC Server handler 1 on 60020): > State: RUNNABLE > Blocked count: 3 > Waited count: 229 > Stack: > java.net.SocketInputStream.socketRead0(Native Method) > java.net.SocketInputStream.read(Unknown Source) > java.io.BufferedInputStream.fill(Unknown Source) > java.io.BufferedInputStream.read(Unknown Source) > java.io.DataInputStream.readShort(Unknown Source) > org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:773) > org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1000) > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1096) > java.io.DataInputStream.readFully(Unknown Source) > java.io.DataInputStream.readFully(Unknown Source) > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) > org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1360) > org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1349) > org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1344) > org.apache.hadoop.io.MapFile$Reader.(MapFile.java:263) > org.apache.hadoop.io.MapFile$Reader.(MapFile.java:242) > org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.(HStoreFile.java:816) > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) > org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:2057) > org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2203) > Thread 28 (IPC Server handler 0 on 60020): > State: BLOCKED > Blocked count: 1 > Waited count: 200 > Blocked on org.apache.hadoop.hbase.HStoreFile@72e8a021 > Blocked by 29 (IPC Server handler 1 on 60020) > Stack: > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:936) > org.apache.hadoop.hbase.HStore$StoreFileScanner.(HStore.java:2057) > org.apache.hadoop.hbase.HStore$HStoreScanner.(HStore.java:2203) > org.apache.hadoop.hbase.HStore.getScanner(HStore.java:2025) > org.apache.hadoop.hbase.HRegion$HScanner.(HRegion.java:1699) > org.apache.hadoop.hbase.HRegion.getScanner(HRegion.java:1271) > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1499) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > java.lang.reflect.Method.invoke(Unknown Source) > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > Thread 14 (IPC Server listener on 60020): > State: RUNNABLE > Blocked count: 0 > Waited count: 0 > Stack: > sun.nio.ch.PollArrayWrapper.poll0(Native Method) > sun.nio.ch.PollArrayWrapper.poll(Unknown Source) > sun.nio.ch.PollSelectorImpl.doSelect(Unknown Source) > sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source) > sun.nio.ch.SelectorImpl.select(Unknown Source) > sun.nio.ch.SelectorImpl.select(Unknown Source) > org.apache.hadoop.ipc.Server$Listener.run(Server.java:285) > ... etc. > Note that most of the threads are blocked on thread 29, which is in turn blocked on a socket read. This node was hosting the ROOT region, so when it got wedged it made hbase unusable (since the master won't reassign ROOT until the server actually goes down). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.