directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel L├ęcharny <elecha...@gmail.com>
Subject JDBM + MVCC LRUCache concern
Date Wed, 04 Apr 2012 22:22:20 GMT
Hi guys,

since I started to work on index removals last week, I started to get 
strange behaviors I put on some wrong modification I have done. Today, 
as I was removing the last call to the OneLevelIndex to replace it by 
rdnIndex, the core-integ tests are blocking.

I did a kill -3 to see where I get a blockage, and here is what I got :

"main" prio=5 tid=7fd9db800800 nid=0x10d310000 waiting on condition 
[10d30d000]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
         at java.lang.Thread.sleep(Native Method)
         at jdbm.helper.LRUCache.put(LRUCache.java:330)
         at 
jdbm.recman.SnapshotRecordManager.update(SnapshotRecordManager.java:401)
         at jdbm.btree.BPage.remove(BPage.java:605)
         at jdbm.btree.BPage.remove(BPage.java:611)
         at jdbm.btree.BTree.remove(BTree.java:464)
         at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable.remove(JdbmTable.java:741)
         - locked <7c226be90> (a 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmTable)
         at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:157)
         at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmRdnIndex.drop(JdbmRdnIndex.java:49)
         at 
org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.delete(AbstractBTreePartition.java:891)
...

The associated code in LRUCache is :

     public void put( K key, V value, long newVersion, Serializer 
serializer,
         boolean neverReplace ) throws IOException, CacheEvictionException
     {
     ...
         while ( true )
         {
         ...
                 else
                 {
                     entry = this.findNewEntry( key, latchIndex );
                     ...
                 }
             }
             catch ( CacheEvictionException e )
             {
                 e.printStackTrace(); // Added for debug purposes
                 sleepForFreeEntry = totalSleepTime < 
this.MAX_WRITE_SLEEP_TIME;

                 ...
             }
             ...

             if ( sleepForFreeEntry )
             {
                 try
                 {
                     Thread.sleep( sleepInterval );
                 ....
                 totalSleepTime += sleepInterval;
             }
             else
             {
                 break;
             }
         }

Basically, we try to add a new element in the cache, it's full, we then 
try to evict one entry, it fails, we get a CacheEvictionException, and 
we go to sleep for 600 seconds...

It's systematic, and I guess that the fact we now pond the RdnIndex 
table way more often than before (just because we don't call anymore the 
OneLevelIndex) cause the cache to get filled and not released fast enough.

As we don't set any size for the cache, its default size is 1024. For 
some of the tests, this mightnot be enough, as we load a lot of entries 
(typically the schema elements) plus many others that get added and 
removed while running tests in revert mode.

If I increase the default size to 65536, the tests are passing.

Ok, now, I have to admit I haven't - yet - looked at the LRUCache code, 
and my analysis is just based on what I saw by quickly looking at the 
code, the stack traces I have added and some few blind guesses.
However, I think we have a serious issue here. As far as I can tel, the 
code itself is probably not responsible for this behaviour, but the way 
we use it is.

Did I missed something ? Is there anything we can do - except increase 
the cache size - to get the tests passing fine ?

I'm more concern about what could occur in real life, when some users 
will load the server up to a point it just stop responding...

Anyone ?

Thanks !

-- 
Regards,
Cordialement,
Emmanuel L├ęcharny
www.iktek.com


Mime
View raw message