From dev-return-40694-apmail-directory-dev-archive=directory.apache.org@directory.apache.org Wed Apr 4 22:22:51 2012 Return-Path: X-Original-To: apmail-directory-dev-archive@www.apache.org Delivered-To: apmail-directory-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 726229F8F for ; Wed, 4 Apr 2012 22:22:51 +0000 (UTC) Received: (qmail 87986 invoked by uid 500); 4 Apr 2012 22:22:51 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 87943 invoked by uid 500); 4 Apr 2012 22:22:51 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 87936 invoked by uid 99); 4 Apr 2012 22:22:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Apr 2012 22:22:51 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of elecharny@gmail.com designates 74.125.82.42 as permitted sender) Received: from [74.125.82.42] (HELO mail-wg0-f42.google.com) (74.125.82.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Apr 2012 22:22:42 +0000 Received: by wgbds11 with SMTP id ds11so883254wgb.1 for ; Wed, 04 Apr 2012 15:22:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:reply-to:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=p49ileSF7YILOZzGY1OOJsjN4DkeMC94nVlRmu6SSfs=; b=ZlQDTKNnKfyhilVWQgYVsMSHIFQC8i39qITQMHwbA048U9sklFqjPKKXEnTYSKHIFu Zn1h60DimQ5PkxXQNA5+fYKJQcdm9frQmjg7k2wTU6EKkR1gmInig1FSybkiGxC6/tB+ nrgiPvYpT1kvP3ViQTQ78b6Wg1sDJZVqxNs6bO2cAE+jUTAbbTHOF3bMAzfx/il6QgCV 0HgiNBVyT8Onfsw0sQJp8B0YSWecvR7dJEQ6UGDDX/cpB40xAegWcYvfA9e+4QOK0ZZ4 JVmGRnWIbi65gPcAuPQ3kHkBXfAWIGkmtkE1E655K4fARauBNK7Bzj3G2XljJy2LGQ2Z ow8w== Received: by 10.216.133.72 with SMTP id p50mr100125wei.78.1333578142520; Wed, 04 Apr 2012 15:22:22 -0700 (PDT) Received: from Emmanuels-MacBook-Pro.local (ran75-1-78-192-106-184.fbxo.proxad.net. [78.192.106.184]) by mx.google.com with ESMTPS id o2sm8192023wiv.11.2012.04.04.15.22.21 (version=SSLv3 cipher=OTHER); Wed, 04 Apr 2012 15:22:21 -0700 (PDT) Message-ID: <4F7CC99C.40005@gmail.com> Date: Thu, 05 Apr 2012 00:22:20 +0200 From: =?UTF-8?B?RW1tYW51ZWwgTMOpY2hhcm55?= Reply-To: elecharny@apache.org User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: Apache Directory Developers List Subject: JDBM + MVCC LRUCache concern Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi guys, since I started to work on index removals last week, I started to get strange behaviors I put on some wrong modification I have done. Today, as I was removing the last call to the OneLevelIndex to replace it by rdnIndex, the core-integ tests are blocking. I did a kill -3 to see where I get a blockage, and here is what I got : "main" prio=5 tid=7fd9db800800 nid=0x10d310000 waiting on condition [10d30d000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at jdbm.helper.LRUCache.put(LRUCache.java:330) at jdbm.recman.SnapshotRecordManager.update(SnapshotRecordManager.java:401) at jdbm.btree.BPage.remove(BPage.java:605) at jdbm.btree.BPage.remove(BPage.java:611) at jdbm.btree.BTree.remove(BTree.java:464) at org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable.remove(JdbmTable.java:741) - locked <7c226be90> (a org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable) at org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:157) at org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:49) at org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.delete(AbstractBTreePartition.java:891) ... The associated code in LRUCache is : public void put( K key, V value, long newVersion, Serializer serializer, boolean neverReplace ) throws IOException, CacheEvictionException { ... while ( true ) { ... else { entry = this.findNewEntry( key, latchIndex ); ... } } catch ( CacheEvictionException e ) { e.printStackTrace(); // Added for debug purposes sleepForFreeEntry = totalSleepTime < this.MAX_WRITE_SLEEP_TIME; ... } ... if ( sleepForFreeEntry ) { try { Thread.sleep( sleepInterval ); .... totalSleepTime += sleepInterval; } else { break; } } Basically, we try to add a new element in the cache, it's full, we then try to evict one entry, it fails, we get a CacheEvictionException, and we go to sleep for 600 seconds... It's systematic, and I guess that the fact we now pond the RdnIndex table way more often than before (just because we don't call anymore the OneLevelIndex) cause the cache to get filled and not released fast enough. As we don't set any size for the cache, its default size is 1024. For some of the tests, this mightnot be enough, as we load a lot of entries (typically the schema elements) plus many others that get added and removed while running tests in revert mode. If I increase the default size to 65536, the tests are passing. Ok, now, I have to admit I haven't - yet - looked at the LRUCache code, and my analysis is just based on what I saw by quickly looking at the code, the stack traces I have added and some few blind guesses. However, I think we have a serious issue here. As far as I can tel, the code itself is probably not responsible for this behaviour, but the way we use it is. Did I missed something ? Is there anything we can do - except increase the cache size - to get the tests passing fine ? I'm more concern about what could occur in real life, when some users will load the server up to a point it just stop responding... Anyone ? Thanks ! -- Regards, Cordialement, Emmanuel Lécharny www.iktek.com