directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <>
Subject Re: Txn discussion
Date Sun, 10 Jun 2012 08:59:45 GMT
Le 6/10/12 2:48 AM, Selcuk AYA a écrit :
>>> How
>>> will this be different from what we are trying to implement now? We
>>> still need a WAL log keeping track of txns on top of B+ trees, changes
>>> could be kept track of in terms of pages or entries and indices. Old
>>> version of data has to be copied over to some other location before
>>> newer version can overwrite it or newer version has to be kept at
>>> location X as long as readers need the old data. Any MVCC system has
>>> to do something like this.
>> No, we don't need all this mechanism if we block all the modifications while
>> a modification is being processed. I agree that modifications will be
>> slower, but this is a price I want to pay if, at the same time, I can
>> guarantee consistant *and* concurrent reads.
> you have a single modification that touches a couple of entries and
> indices, how will reads proceed concurrently if the ongoing
> modification does not pay attention to not overwriting the versions
> the reads are using ?

Because when the read starts, it uses the latest existing revision of 
the index used to fetch the entries. We should get the current revision 
when the read starts for each of the index it will use. Currently, as we 
use reverse indexes from potentially many indices, that will imply we 
introduce a protected section in the read that fetches all the valid 
versions for all the uses indices. We can have a data structure that 
contains those versions which is updated atomically by a modification, 
so that the searches don't have to take care. When a modification 
starts, it copies this data structure, do its update, and at the end, if 
everything went fine, update the data structure with the new revisions 
for all the tables.
> Also think of adding another partition tomorrow. Say HBASE partition
> is added which exposes atomic writes and atomic reads or scan
> consistent scans. If we plug that partition with what we are
> implementing right now, txns over HBASE partitions would just work
> without much effort.
>> Yes. What you have written is also a way to keep partition dumb. What I'm
>> suggesting forces you to have MVCC copable partitions, which is a real
>> hassle. Now, let's face it : do we need anything else, atm ? Plus HBase
>> already implement a similar system to protect reads against conncurrent
>> modifications, so we don't necessarily need to have it.
>> Also keep in mind that if we want to implement the solution I proposed, we
>> still need to modify the code to protect the partitions against concurrent
>> modifications, and to leverage the MVCC parts in JDBM (and probably write
>> the versions on disk too).
> no. HBASE is not transactional. You still need transactions to make
> queries consistent.
HBase now supports multi-row transactions : I guess this 
is what we need.

Emmanuel Lécharny

View raw message