directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <elecha...@gmail.com>
Subject Re: Partly done : [ApacheDS] Transaction support
Date Mon, 26 Feb 2018 10:40:48 GMT
To be clear :

'partly done' means we don't yet support transactions driven by a client
request (aka RFC 5805). This will be the next step, but I tink it's not
urgent.

Also we don't have transactions for the Mavibot partition yet, as we
need to move to the newest version of Mavibot, which has not been
released yet. Again, it's not urgent.

That's it for today :-)


Le 26/02/2018 à 11:27, Emmanuel Lecharny a écrit :
> Hi guys,
> 
> I'm done with the changes in the server : all the tests are now passing green.
> 
> That means we have a JDBM based server with operation being atomic. That should solve
the database corruption we have.
> 
> In the process, I also changed the way the API handle requests/responses, modifying the
operation Future, removing queues that were used expect for operations that could take more
than one response (Bind, Search and ExtendedOperation). We now use a simple wait/notify solution
that works well. The Cursor implementation has also been slightly modified to remove the monitor
in it, as it was not sued. Last not least, the operation Future.isDone() method is now returning
'true' if we have got an operation response. This is important for asunc implementation as
it's a not blocking way to know if we can go on and fetch the operation (if it's not abandoned,
of course).
> 
> Otherwise, I also have removed a useless feature in the JDBM Parttition : we were storing
the number of elements in a B-tree beside the B-tree, which is totally spurious as teh B-tree
already knows about this number : tehre is a btree.size() method. This save a few writes.
> 
> Overall, as I said, the server is 20% faster, and this is due to saved writes and to
changes in the way the  LDAP API handles values.
> 
> I still have some cleanup to do, but I think we are on a verge of a release of teh API
and the server, which will also leads to a release of Studio :-)
> 
> On 2018/02/21 22:01:38, Emmanuel Lecharny <elecharny@apache.org> wrote: 
>> Hi,
>>
>> a quick update on the on-going work (last week-end and last night).
>>
>> Most of the core-integ tests are passing now (except 3 errors and 4 failures). FTR,
there are 335 instances of OperationContext to check in the whole code, many of them using
a readTransaction (149).
>>
>> I'm facing 2 issues atm :
>> - first, the Search operation must not close the transaction until the associated
cursor is closed. This is not currently properly done. The best solution I foresee is to store
the transaction in the Cursor, and to close it when the cursor is closed. That would imply
we may face some issue if the cursor remains open for a long time.
>> - second, as we don't have a file per index anymore, the JdbmPartition initialization
is now problematic : we are checking the presence of such files on disk to see if we have
to re-create the missing indexes. IMO, this is a wrong thing to do : we should check the configuration
and create the index accordingly if the index aren't present in the database. That also mean
we should check in the database, something we currently don't do. This is something I have
to fix, it's blocking some of the tests (and likely the server won't restart, too).
>>
>> Otherwise, I have checked all the operations, before the OperationManager and after
it, the transaction seems to be properly initiated and committed or aborted. 
>>
>> I'm not far from something that works...
>>
>> More later.
>>
>> On 2018/02/17 17:26:05, Emmanuel Lécharny <elecharny@gmail.com> wrote:

>>> Hi,
>>>
>>> this is an update on the on-going work.
>>>
>>> I have started to add support of transaction in JDBM first, using the
>>> changes I have applied last week.
>>>
>>> Actually, JDBM supports transaction natively (to some extend), and teh
>>> big mistake we have done was to limit its use on a per-index/table
>>> level. Which means each index and the master table are updated within
>>> its own transaction, which is clearly not good when we have to make each
>>> LDAP operation atomic.
>>>
>>> So the first big change is that I use one single transaction manager
>>> across all the index and the master table : we commit an operation
>>> globally. I do expect that it will make it safer (ie, no concurrent
>>> updates of indexes, leading to some database corruption).
>>>
>>> A typical usage looks like (in OperationManager) :
>>>
>>> public void add( AddOperationContext addContext ) throws LdapException
>>> {
>>>     ...
>>>     // Find the working partition
>>>     Partition partition =
>>> directoryService.getPartitionNexus().getPartition( dn );
>>>     addContext.setPartition( partition );
>>>     ...
>>>
>>>     lockWrite();
>>>
>>>     // Start a Write transaction right away
>>>     PartitionTxn transaction = null;
>>>
>>>     try
>>>     {
>>>         transaction = partition.beginWriteTransaction();
>>>         addContext.setTransaction( transaction );
>>>
>>>         head.add( addContext );
>>>         transaction.commit();
>>>     }
>>>     catch ( LdapException | IOException e )
>>>     {
>>>         try
>>>             {
>>>             if ( transaction != null )
>>>             {
>>>                 transaction.abort();
>>>             }
>>>
>>>             throw new LdapOtherException( e.getMessage(), e );
>>>         }
>>>         catch ( IOException ioe )
>>>         {
>>>             throw new LdapOtherException( ioe.getMessage(), ioe );
>>>         }
>>>     }
>>>     finally
>>>     {
>>>         unlockWrite();
>>>     }
>>>
>>> It sounds a bit convoluted, but what we do is :
>>>
>>> - get the Partition we are working on at this level instead of waiting
>>> to be in the Parition itself. That makes sense because an update
>>> operation (even a move) can't be done across partitions. Note that we
>>> don't fetch the Partition in the Nexus, as we already have it
>>>
>>> - then we ask the selected partition to start a write transaction. What
>>> is interesting here is that not all the partition will create this
>>> transaction. Typically, the Schema partition won't - it makes no sense
>>> -, nor the config partition : they have in-memory indexes, and the
>>> master table is a LDIF file, it's either properly updated or not.
>>>
>>> - the OperationContext stores the transaction and the partition : the
>>> transaction may be reused by interceptors (typically, we may have to
>>> lookup some other entries, or updates entries while processing the
>>> on-going peration)
>>>
>>> - then we call the first interceptor, down to the nexus.
>>>
>>> - If all goes well, the transaction is committed, otherwise, it's aborted
>>>
>>>
>>> For JDBM, a commit looks like that :
>>>
>>>     public void commit() throws IOException
>>>     {
>>>         recordManager.commit();
>>>
>>>         // And flush the journal
>>>         BaseRecordManager baseRecordManager = null;
>>>
>>>         if ( recordManager instanceof CacheRecordManager )
>>>         {
>>>             baseRecordManager = ( ( BaseRecordManager ) ( (
>>> CacheRecordManager ) recordManager ).getRecordManager() );
>>>         }
>>>         else
>>>         {
>>>             baseRecordManager = ( ( BaseRecordManager ) recordManager );
>>>         }
>>>
>>>
>>>         if ( syncOnWrite )
>>>         {
>>>             baseRecordManager.getTransactionManager().synchronizeLog();
>>>         }
>>>     }
>>>
>>> - First the RecordManager is committed, which flush the updates into the
>>> Log file (first disk write). Up to this point, all the updates are done
>>> in memory.
>>>
>>> - then we ask the RecordManager to update the databse from the log.
>>>
>>> And that's it.
>>>
>>> Aborting the operaiton is way simpler :
>>>
>>>     public void abort() throws IOException
>>>     {
>>>         recordManager.rollback();
>>>     }
>>>
>>>
>>> I have removed all the sync() methods called all over the code, they are
>>> totally useless now.  Bottom line, we end up with having one single file
>>> storing the data plus a log file containing the on going updates. If the
>>> server crashes, I expect the server to catch up from the journal (still
>>> have to check that aspect).
>>>
>>> Extra benefit : it's *WAY* faster. The perf test I have shows that
>>> adding entries in the server using this code is 2 times faster than with
>>> the previous approach (ok, don't get too excited, I'm talking about 180
>>> add/s vs 98 add/s)
>>>
>>>
>>> There are a lot to do still. The Read transactions also need to be dealt
>>> with. A typicall read looks lik that :
>>>
>>>     ...
>>>     LookupOperationContext lookupContext = new LookupOperationContext(
>>>   adminSession, modifyContext.getDn(), "+", "*" );
>>>     lookupContext.setPartition( modifyContext.getPartition() );
>>>     lookupContext.setTransaction( modifyContext.getTransaction() );
>>>
>>>     entry = directoryService.getPartitionNexus().lookup( lookupContext );
>>>     ...
>>>
>>> here we reuse the modify operation's transaction.
>>>
>>> Or :
>>>
>>>     LookupOperationContext lookupContext = new LookupOperationContext(
>>> adminSession, lookupContext.setPartition( partition );
>>>
>>>     try ( PartitionTxn partitionTxn = partition.beginReadTransaction() )
>>>     {
>>>         lookupContext.setTransaction( partitionTxn );
>>>         adminGroup = nexus.lookup( lookupContext );
>>>     }
>>>     catch ( IOException ioe )
>>>     {
>>>         throw new LdapOtherException( ioe.getMessage(), ioe );
>>>     }
>>>
>>> Here, we create a new Read transaction in a try-with-resource call, this
>>> transaction will be automatically closed. Note we don't have to commit
>>> or abort it.
>>>
>>>
>>> So now, I'm fixing the tests, changing all the places in the code where
>>> I need to start a read transaction (around 80 places for the lookup
>>> operation, for instance, then there is search, hasEntry, etc...),
>>> properly encapsulating the update operations. That will take time...
>>>
>>> Once it will work for JDBM, I will have to get it to work for Mavibot.
>>>
>>> Still, this is encouraging.
>>>
>>> have a nice week-end.
>>>
>>> -- 
>>> Emmanuel Lecharny
>>>
>>> Symas.com
>>> directory.apache.org
>>>
>>>
>>

-- 
Emmanuel Lecharny

Symas.com
directory.apache.org


Mime
View raw message