directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <>
Subject Re: Txn discussion
Date Sun, 10 Jun 2012 09:20:22 GMT
On Sat, Jun 9, 2012 at 6:22 PM, Emmanuel L├ęcharny <>wrote:

> Hi guys,
> independently from the ongoing work on the txn layer, I'd like to start a
> thread of discussion about the path we selected, and the other possible
> options.
> Feel free to express your opinion here, I'll create a few items I'd liek
> to see debated.
> 1) Introduction
> We badly need to have a consistent system. The fact is that the current
> trunk - and I guess this is true for all the released we have done so far)
> suffers from some serious issue when multiple modifications are done during
> searches. The reason is that we depend on a BTree implementation that
> exposes a data structure directly reading the pages containing the data,
> expecting those pages to remain unchanged in the ong run. Obviously, when
> we browse more than one entry, we are likely to see a modification changing
> the data...
> 2) txn layer
> There are a few way to get this problem solved :
> - we can have a MVCC backend, and a protection against concurrent
> modifications. Any read will always succeed, as each read will use a
> revision and only one.
> - we can also read fast the results and store them somwhere, blocking the
> modification until the read is finished.
> - or we can keep a copy of the modified elements within the original
> elements, until the seraches that use those elements are finished.
> (there are probably some other solutions, but I don't know them)
> AFAICT, the transaction branch is implementing the third solution, keepong
> the copy of modified elements in memory, so that they can be sent back to
> the user.
> None of those solution are free of drawbacks.
Right now we're adding the foundations so of course there will be issues
initial. There are several techniques we can use to mitigate the problem
the problems.

> I think that the first approach, even if it implies we forces a
> serialization of the writes, is the best solution. The rational, AFAICT, is
> that we don't have to deal with the way the backend keep versions of
> elements, this is not our business. Plus keeping the write serialized
> guarantees that we won't compromized the backend.
As Selcuk already pointed out you will need the same machinery to do this
below inside the partition. It will lead to the same problems.

> At this point, I'd like we discuss all those options, whatever we are
> currently working on.
> 3) cross-partition vs single partition protection
> Atm, we are working on a cross partition system. That means we protect all
> the partitions at the same time : moving an entry from one partition to
> another one will be done completely, or reverted.
> I'm not sure we need such a feature. I don't see what it brings, and even
> if it brings some advantages, I'm not sure we need such a feature now.
I'm in complete disagreement. There are several reasons why we need to do
this across partitions:

* First keeping partitions simple, handling these semantics in partitions
will make writing new partitions way too difficult to implement
* Aliases working across partitions
* Implementing views and being able to have editable views
* Centrally rooted partition
* Nestable partitions
* ACID across partitions
* Better means to integrate with HBase partition
* Better cache management
* Better means to handle snapshotting and rollback
* Clear transaction boundaries even if changes are across partitions which
makes replication easier to handle.

Say goodbye to a lot of these factors if we do not do this.

Not having to add a txn layer above the partitions is way easier to
> implement.
Probably easier but not that much easier. We will need the same machinery
if this will work at the partition level. And the machinery will have to be
implemented separately for each partition.

> Here, too, I'd like we discuss our options, and the pros and cons of using
> a txn layer on top of single partitions instead of
> muliple partitions.
I'm completely against this move as I think it will cause us more problems
than the ones we can fully solve right now. We just need patience.

If Emmanuel you don't have time to deal with this painful merge, perhaps
Selcuk and I can handle doing the merge?

> ok, this is probably enough elements we have to discuss. You turn :)
I understand there are hairy issues. However realize that this is an
incomplete state and realize that we do have ways to handle all the
problems. Selcuk provided some excellent solutions in this thread.

To back out now would be a massive mistake. It would also curtail the
growth and progress of the server in the ways described in our application
document. This single decision here would be one of the worst we've ever
made if we decide to back out at this stage.

FYI I'm going to be on the road for the next 48-72 hours. Will still try to
respond to this thread.

Best Regards,
-- Alex

View raw message