directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selcuk AYA <ayasel...@gmail.com>
Subject Re: Txn discussion
Date Sat, 09 Jun 2012 21:46:36 GMT
Lets say we sacrifice cross partition txns. I think that is OK.



On Sat, Jun 9, 2012 at 10:45 PM, Howard Chu <hyc@symas.com> wrote:
> Emmanuel Lécharny wrote:
>>
>> Hi guys,
>>
>> independently from the ongoing work on the txn layer, I'd like to start
>> a thread of discussion about the path we selected, and the other
>> possible options.
>>
>> Feel free to express your opinion here, I'll create a few items I'd liek
>> to see debated.
>>
>> 1) Introduction
>>
>> We badly need to have a consistent system. The fact is that the current
>> trunk - and I guess this is true for all the released we have done so
>> far) suffers from some serious issue when multiple modifications are
>> done during searches. The reason is that we depend on a BTree
>> implementation that exposes a data structure directly reading the pages
>> containing the data, expecting those pages to remain unchanged in the
>> ong run. Obviously, when we browse more than one entry, we are likely to
>> see a modification changing the data...
>>
>> 2) txn layer
>>
>> There are a few way to get this problem solved :
>> - we can have a MVCC backend, and a protection against concurrent
>> modifications. Any read will always succeed, as each read will use a
>> revision and only one.

Lets say we want to implement a txn system within JDBM. We have to
implement this not within a singel B+ tree but across B+ trees. How
will this be different from what we are trying to implement now? We
still need a WAL log keeping track of txns on top of B+ trees, changes
could be kept track of in terms of pages or entries and indices. Old
version of data has to be copied over to some other location before
newer version can overwrite it or newer version has to be kept at
location X as long as readers need the old data. Any MVCC system has
to do something like this.

For us, newer version of data is kept at WAL as long as a reader needs
the old version of data. As explained below, for simplicity we keep a
copy of WAL in memory in a format that makes merging data for readers
easier and faster. More on this below.

I think what we implement right now is not very different from what we
would implement inside a single partition.
>> - we can also read fast the results and store them somwhere, blocking
>> the modification until the read is finished.
>> - or we can keep a copy of the modified elements within the original
>> elements, until the seraches that use those elements are finished.
>>
>> (there are probably some other solutions, but I don't know them)
>>
>> AFAICT, the transaction branch is implementing the third solution,
>> keepong the copy of modified elements in memory, so that they can be
>> sent back to the user.

it is true that the current txn system makes use of in memory copies
for fast merge of data. However, what it really does it it just keeps
a copy of txn wal log in memory. This can be extended to discard the
in memory copy and directly read from the WAL when memory exceeds some
threshold for example. Implementing read from memory was just easier.

Also think of adding another partition tomorrow. Say HBASE partition
is added which exposes atomic writes and atomic reads or scan
consistent scans. If we plug that partition with what we are
implementing right now, txns over HBASE partitions would just work
without much effort.

>>
>> None of those solution are free of drawbacks.
>>
>> I think that the first approach, even if it implies we forces a
>> serialization of the writes, is the best solution. The rational, AFAICT,
>> is that we don't have to deal with the way the backend keep versions of
>> elements, this is not our business. Plus keeping the write serialized
>> guarantees that we won't compromized the backend.
>
>
> IMAO, this is also the best. ;) It's extremely memory efficient, it's
> extremely efficient for reads, and it is perfectly consistent.
>
>
>> At this point, I'd like we discuss all those options, whatever we are
>> currently working on.
>
>
> --
>  -- Howard Chu
>  CTO, Symas Corp.           http://www.symas.com
>  Director, Highland Sun     http://highlandsun.com/hyc/
>  Chief Architect, OpenLDAP  http://www.openldap.org/project/
>
>

thanks
Selcuk

Mime
View raw message