Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A296A1071B for ; Fri, 27 Dec 2013 17:12:06 +0000 (UTC) Received: (qmail 21665 invoked by uid 500); 27 Dec 2013 17:12:01 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 21584 invoked by uid 500); 27 Dec 2013 17:11:58 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 21549 invoked by uid 99); 27 Dec 2013 17:11:52 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Dec 2013 17:11:52 +0000 Date: Fri, 27 Dec 2013 17:11:52 +0000 (UTC) From: "ASF subversion and git services (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-2027) ZooKeeperInstance.close() not freeing resources in multithreaded env MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857564#comment-13857564 ] ASF subversion and git services commented on ACCUMULO-2027: ----------------------------------------------------------- Commit 975e8c05e8d11f3848e6c800f4d2772026f6c3a3 in branch refs/heads/1.4.5-SNAPSHOT from Bill Slacum [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=975e8c0 ] ACCUMULO-2027 Synchronized access to ZooKeeperInstance methods that mutated state The usage pattern of a ZooKeeperInstance is to have many clients use the same instance. Internall, the ZooKeeperInstances keeps a cache of information in a ZooCache, which will manage connections to Zookeeper for getting and putting data. With the addition of close semantics on the ZKI, it is expected the ZKI will free resources and halt operations for all clients. This requires ZKI method calls to be linearizable, otherwise race conditions as described in ACCUMULO-2027 will occur when a ZKI is simultaneously asked to close and retrieve information. This patch resolves that race condition by synchronizing all methods on a ZKI that interact with an ZKI instance's ZooCache. > ZooKeeperInstance.close() not freeing resources in multithreaded env > -------------------------------------------------------------------- > > Key: ACCUMULO-2027 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2027 > Project: Accumulo > Issue Type: Bug > Reporter: Keith Turner > Assignee: William Slacum > Priority: Critical > Fix For: 1.4.5, 1.5.1, 1.6.0 > > > While looking at the changes related to ZooKeeperInstance.close() in the 1.4.5-SNAPSHOT branch I noticed there were race conditions where resources were not properly released. One type of race condition is where a thread is between a closed check in ZooKeeperInstance and calling a ZooCache method when ZooKeeperInstance.close() is called. The following is an example situation > # Thread 1 uses ZooKeeperInstance1 to get a zoocache. > # Thread 2 calls close() on ZooKeeperInstnce1 which calls close() on zoocache > # Thread 1 uses the zoocache it has reference to, causing a new zookeeper connection to be created. > Below is an example program that will trigger this behavior. For me this little example program reliably shows a connected zookeeper after all of the threads die. If I use 0 threads it will show a closed zookeeper connection at the end. > {code:java} > static class WriteTask implements Runnable { > private BatchWriter writer; > private Random rand; > WriteTask(Connector conn) throws TableNotFoundException { > rand = new Random(); > writer = conn.createBatchWriter("foo5", 10000000, 30000, 1); > } > @Override > public void run() { > try { > while (true) { > Mutation m1 = new Mutation(String.format("%06d", rand.nextInt(1000000))); > m1.put(String.format("%06d", rand.nextInt(100)), String.format("%06d", rand.nextInt(100)), String.format("%06d", rand.nextInt(1000000))); > writer.addMutation(m1); > writer.flush(); > } > } catch (Exception e) { > System.out.println(e.getMessage()); > } > } > } > static class ReadTask implements Runnable { > private Scanner scanner; > ReadTask(Connector conn) throws TableNotFoundException { > scanner = conn.createScanner("foo5", new Authorizations()); > } > @Override > public void run() { > try { > while (true) { > for (Entry entry : scanner) { > } > } > } catch (Exception e) { > System.out.println(e.getMessage()); > } > } > } > @Test(timeout = 30000) > public void test2() throws Exception { > ZooKeeperInstance zki = new ZooKeeperInstance(accumulo.getInstanceName(), accumulo.getZooKeepers()); > Connector conn = zki.getConnector("root", "superSecret"); > conn.tableOperations().create("foo5"); > ArrayList threads = new ArrayList(); > int numThreads = 10; > for (int i = 0; i < numThreads; i++) { > Thread t = new Thread(new WriteTask(conn)); > t.start(); > threads.add(t); > } > for (int i = 0; i < numThreads; i++) { > Thread t = new Thread(new ReadTask(conn)); > t.start(); > threads.add(t); > } > // let threads get spun up > Thread.sleep(1000); > ZooSession.printSessions(); > zki.close(); > // wait for the threads to die > for (Thread thread : threads) { > thread.join(); > } > ZooSession.printSessions(); > } > {code} > Below are some changes I made to ZooSession for debugging purposes. > {noformat} > diff --git a/src/core/src/main/java/org/apache/accumulo/core/zookeeper/ZooSession.java b/src/core/src/main/java/org/apache/accumulo/core/zookeeper/ZooSession.java > index b3db26f..475a21d 100644 > --- a/src/core/src/main/java/org/apache/accumulo/core/zookeeper/ZooSession.java > +++ b/src/core/src/main/java/org/apache/accumulo/core/zookeeper/ZooSession.java > @@ -20,6 +20,8 @@ > import java.net.UnknownHostException; > import java.util.HashMap; > import java.util.Map; > +import java.util.Map.Entry; > +import java.util.Set; > > import org.apache.accumulo.core.util.UtilWaitThread; > import org.apache.log4j.Logger; > @@ -29,7 +31,7 @@ > import org.apache.zookeeper.ZooKeeper; > import org.apache.zookeeper.ZooKeeper.States; > > -class ZooSession { > +public class ZooSession { > > private static final Logger log = Logger.getLogger(ZooSession.class); > > @@ -121,6 +123,8 @@ > > ZooSessionInfo zsi = sessions.get(sessionKey); > if (zsi != null && zsi.zooKeeper.getState() == States.CLOSED) { > + System.out.println("Removing closed session "); > + new Exception().printStackTrace(); > if (auth != null && sessions.get(readOnlySessionKey) == zsi) > sessions.remove(readOnlySessionKey); > zsi = null; > @@ -137,4 +141,13 @@ > } > return zsi.zooKeeper; > } > + > + public static synchronized void printSessions() { > + Set> es = sessions.entrySet(); > + > + for (Entry entry : es) { > + System.out.println(entry.getKey() + " " + entry.getValue().zooKeeper.getState()); > + } > + } > + > } > {noformat} > With the above changes I will see an exception like the following when one of the race conditions occurs. > {noformat} > Removing closed session > java.lang.Exception > at org.apache.accumulo.core.zookeeper.ZooSession.getSession(ZooSession.java:127) > at org.apache.accumulo.core.zookeeper.ZooReader.getSession(ZooReader.java:37) > at org.apache.accumulo.core.zookeeper.ZooReader.getZooKeeper(ZooReader.java:41) > at org.apache.accumulo.core.zookeeper.ZooCache.getZooKeeper(ZooCache.java:56) > at org.apache.accumulo.core.zookeeper.ZooCache.retry(ZooCache.java:127) > at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:233) > at org.apache.accumulo.core.zookeeper.ZooCache.get(ZooCache.java:188) > at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:156) > at org.apache.accumulo.core.client.impl.TabletLocator.getInstance(TabletLocator.java:96) > at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:245) > at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:94) > at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:176) > at org.apache.accumulo.minicluster.MiniAccumuloClusterTest$ReadTask.run(MiniAccumuloClusterTest.java:109) > at java.lang.Thread.run(Thread.java:662) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)