directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel L├ęcharny <elecha...@gmail.com>
Subject Uodate : [ApacheDS] Transaction support
Date Sat, 17 Feb 2018 17:26:05 GMT
Hi,

this is an update on the on-going work.

I have started to add support of transaction in JDBM first, using the
changes I have applied last week.

Actually, JDBM supports transaction natively (to some extend), and teh
big mistake we have done was to limit its use on a per-index/table
level. Which means each index and the master table are updated within
its own transaction, which is clearly not good when we have to make each
LDAP operation atomic.

So the first big change is that I use one single transaction manager
across all the index and the master table : we commit an operation
globally. I do expect that it will make it safer (ie, no concurrent
updates of indexes, leading to some database corruption).

A typical usage looks like (in OperationManager) :

public void add( AddOperationContext addContext ) throws LdapException
{
    ...
    // Find the working partition
    Partition partition =
directoryService.getPartitionNexus().getPartition( dn );
    addContext.setPartition( partition );
    ...

    lockWrite();

    // Start a Write transaction right away
    PartitionTxn transaction = null;

    try
    {
        transaction = partition.beginWriteTransaction();
        addContext.setTransaction( transaction );

        head.add( addContext );
        transaction.commit();
    }
    catch ( LdapException | IOException e )
    {
        try
            {
            if ( transaction != null )
            {
                transaction.abort();
            }

            throw new LdapOtherException( e.getMessage(), e );
        }
        catch ( IOException ioe )
        {
            throw new LdapOtherException( ioe.getMessage(), ioe );
        }
    }
    finally
    {
        unlockWrite();
    }

It sounds a bit convoluted, but what we do is :

- get the Partition we are working on at this level instead of waiting
to be in the Parition itself. That makes sense because an update
operation (even a move) can't be done across partitions. Note that we
don't fetch the Partition in the Nexus, as we already have it

- then we ask the selected partition to start a write transaction. What
is interesting here is that not all the partition will create this
transaction. Typically, the Schema partition won't - it makes no sense
-, nor the config partition : they have in-memory indexes, and the
master table is a LDIF file, it's either properly updated or not.

- the OperationContext stores the transaction and the partition : the
transaction may be reused by interceptors (typically, we may have to
lookup some other entries, or updates entries while processing the
on-going peration)

- then we call the first interceptor, down to the nexus.

- If all goes well, the transaction is committed, otherwise, it's aborted


For JDBM, a commit looks like that :

    public void commit() throws IOException
    {
        recordManager.commit();

        // And flush the journal
        BaseRecordManager baseRecordManager = null;

        if ( recordManager instanceof CacheRecordManager )
        {
            baseRecordManager = ( ( BaseRecordManager ) ( (
CacheRecordManager ) recordManager ).getRecordManager() );
        }
        else
        {
            baseRecordManager = ( ( BaseRecordManager ) recordManager );
        }


        if ( syncOnWrite )
        {
            baseRecordManager.getTransactionManager().synchronizeLog();
        }
    }

- First the RecordManager is committed, which flush the updates into the
Log file (first disk write). Up to this point, all the updates are done
in memory.

- then we ask the RecordManager to update the databse from the log.

And that's it.

Aborting the operaiton is way simpler :

    public void abort() throws IOException
    {
        recordManager.rollback();
    }


I have removed all the sync() methods called all over the code, they are
totally useless now.  Bottom line, we end up with having one single file
storing the data plus a log file containing the on going updates. If the
server crashes, I expect the server to catch up from the journal (still
have to check that aspect).

Extra benefit : it's *WAY* faster. The perf test I have shows that
adding entries in the server using this code is 2 times faster than with
the previous approach (ok, don't get too excited, I'm talking about 180
add/s vs 98 add/s)


There are a lot to do still. The Read transactions also need to be dealt
with. A typicall read looks lik that :

    ...
    LookupOperationContext lookupContext = new LookupOperationContext(
  adminSession, modifyContext.getDn(), "+", "*" );
    lookupContext.setPartition( modifyContext.getPartition() );
    lookupContext.setTransaction( modifyContext.getTransaction() );

    entry = directoryService.getPartitionNexus().lookup( lookupContext );
    ...

here we reuse the modify operation's transaction.

Or :

    LookupOperationContext lookupContext = new LookupOperationContext(
adminSession, lookupContext.setPartition( partition );

    try ( PartitionTxn partitionTxn = partition.beginReadTransaction() )
    {
        lookupContext.setTransaction( partitionTxn );
        adminGroup = nexus.lookup( lookupContext );
    }
    catch ( IOException ioe )
    {
        throw new LdapOtherException( ioe.getMessage(), ioe );
    }

Here, we create a new Read transaction in a try-with-resource call, this
transaction will be automatically closed. Note we don't have to commit
or abort it.


So now, I'm fixing the tests, changing all the places in the code where
I need to start a read transaction (around 80 places for the lookup
operation, for instance, then there is search, hasEntry, etc...),
properly encapsulating the update operations. That will take time...

Once it will work for JDBM, I will have to get it to work for Mavibot.

Still, this is encouraging.

have a nice week-end.

-- 
Emmanuel Lecharny

Symas.com
directory.apache.org


Mime
View raw message