directory-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Accorsi, Carlo" <Carlo.Acco...@siemens.com>
Subject RE: ApacheDS with Mavibot anytime soon?
Date Thu, 24 Mar 2016 23:02:43 GMT
HI Ashma, 
We have all our directory content dumped out as an LDIF on a periodic schedule. If the system
crashes, we delete the partition folders on the file system, restart the server and  re-inject
the entries from the ldif. 

If your directory is small < 1000 entries, you can just get them perhaps all in one search
result. 
Otherwise use a paged result to loop through each entry and call:

String ldif = LdifUtils.convertToLdif(entry)

and write / append the string value to a file. This is your backup ldif file. 

Hope this helps.. 




-----Original Message-----
From: Ashma Shrestha [mailto:ashma.shrestha@gmail.com] 
Sent: Thursday, March 24, 2016 4:38 PM
To: users@directory.apache.org
Subject: Re: ApacheDS with Mavibot anytime soon?

Hi Carlos,

You have mentioned "we get the 'ERR_554 double get for block' and then we need to restore.
We have an automated way of doing this now so it's not too disruptive when it happens." .
I am running into the same ERR_554 issue, can you let me know how you fixed it?

Thanks,

Ashma

On Mon, Sep 28, 2015 at 9:15 AM, <Carlo.Accorsi@ibs-ag.com> wrote:

> Emmanuel! Wow!! Thank you for that most detailed and comprehensive 
> write up of the concurrency problem you are working to solve with 
> Mavibot.  The design sounds world class.
> I now fully appreciate the complexity of this transaction issue and 
> the ASCII art helped make it clear. We realize you're all doing your 
> best to devise a robust solution for this and so we'll sit tight for now.
> I appreciate the time you took to explain this to us. Have fun at the 
> conference.
> Best, Carlo Accorsi
>
>
>
>
> -----Original Message-----
> From: Emmanuel Lécharny [mailto:elecharny@gmail.com]
> Sent: Sunday, September 27, 2015 1:23 PM
> To: users@directory.apache.org
> Subject: Re: ApacheDS with Mavibot anytime soon?
>
> Le 23/09/15 21:41, Carlo.Accorsi@ibs-ag.com a écrit :
> > Hi,
>
> hi,
>
> sorry for the delayed response ! I saw your mail 3 days ago, but I 
> forgot to answer back then, being caught in a storm of delaying events...
> >
> > We have one customer that wants to get replication implemented. 
> > We've
> set it up in house and it seems to work well.
> > The trouble is there are 10's of thousands of users and all sorts of
> concurrent activity. They're on M16 and a few times per month, we get 
> the
> 'ERR_554 double get for block' and then we need to restore.
> > We have an automated way of doing this now so it's not too 
> > disruptive
> when it happens.
> Hopefully. Still I understand this is not convenient.
> >
> > My question is, would adding replication just add more concurrent
> activity to the servers and lead more frequently to the ERR_554 situation?
> For the consumer, it will be just as if the updates were coming from a 
> standard user. So, no, I don't think it's a real problem.
> >
> > Also he's asked to upgrade to M20 but my hunch is that for this 
> > issue it
> won't matter much because we're still on JDBM?
>
> True. It's just that some bugs have been fixed in M20.
> >
> > Finally, we're eager to test out a release with the Mavibot backed.
> > I'm sure it's possible to build ourselves but we're trying to keep 
> > the
> moving parts to a minimum and use only ApacheDS releases.
> I understand.
>
> Mavibot work has been delayed, and we are working on it as much as we 
> can, but it's not enough. Kiran and I have day jobs atm that forbid us 
> to dedicate as much time as would be necessary to complete the last 
> part that is mandatory to get this problem fixed : the transaction 
> system. We restarted to 'think' about the best implementation 2 weeks 
> ago (on your free time) and we expect to be able to spend more time on 
> it next week, as we will both be at Budapest, during the Apache Conference.
>
> Transaction  support (the way we need it, ie cross-btree) is not that 
> simple to impelment. We have a pretty clear idea on how it should 
> work, but that mpacts the current design, as we need to woirk in 
> memory, and avoid to copy pages that have already been modifed in the 
> transaction boundaries. We also have to deal with the cleanup of old 
> versions, which also means we need to implement the version manager.
>
> To give you a insight on what I'm coming for, here is a draft of my 
> thoughts about this transaction manager (note that this is my vision, 
> that I haven't shared yet, because I don't think it's covering all the 
> moving parts, but still, I think it won't be that different to what we 
> will end with ) :
>
> Transaction support
> -------------------
>
> Mavibot must support transaction across many B-tres (ie, whatever the 
> number of b-trees we are updating, we should wait before committing 
> the changes up to the point the transaction is closed). That means we 
> may have many updates pending in memory. In any case, if a rollback is 
> done, the modified b-trees will remain intact, in the same state they 
> were before the transaction started.
>
> To achive this, we need to work in a working memory. As we may have to 
> fetch pages that are to be updated, we will face three cases :
> - the page is not in memory :
> we have to fetch them from disk, update them and store it in the 
> working memory we can read it back if needed
> - the page is in the cache :
> we have to copy it to the working memory
> - the page is in the working memory : we simply update it, leaving it 
> in the working memory.
>
> So if we have to modify the same page many times, it will be updated 
> in the working memory, we won't need to copy it.
>
> That actually change the current Mavibot implementation, as it's 
> copying pages no matter what. Here, if the page is coming from the 
> working memory, we just update it.
>
>
> Layout
> ------
>
>  Disk            Cache           Working
>   ___                            Memory
>  /   \          .------.
> +     +        /      /|         .------.
> |\___/|       .------. |       .------. |
> +     +       |      | |       |      | |
> |\___/| ----> |      | | ----> |      | |
> +     +       |      |/        |      |/
> |\___/|       .------.         .------.
> +     +
>  \___/
>
>  When the transaction is rollbacked, we simply delete the content of 
> the working memory.
>  When the transaction is committed, we write all the pages in the 
> working memory on disk, and into the cache, we update the header, and 
> we purge the working memory.
>
>  If a crash occurs during a transaction, then either the transaction 
> is rollbacked (if the process is still running) - which is just about 
> purging the working memory -, or there will be nothing to do if we 
> have to restart the application, as the modified pages were in memory 
> nad have been removed while the process crashed.
>
> OTOH, when we read, we must be sure that the B-trees are present, and 
> that we always use the B-tree revisions that are correlated. This is 
> critical when using LDAP, where many B-trees are updated during a sinlge write.
>
> The way to do that is to use a revision, and always use the B-trees 
> which revision is just below or equal to this revision. Let's see with 
> an example
>
>  +-----+---+---+---+---+---+---+---+...
>  |\ Rev|   |   |   |   |   |   |   |
>  | \   |   |   |   |   |   |   |   |
>  |  \  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
>  |B- \ |   |   |   |   |   |   |   |
>  |tree\|   |   |   |   |   |   | L | N
>  +-----+---+---+---+---+---+---+---+...
>  | aaa | X |   |   | X | X | X |   |
>  |     | 1 | 1 | 1 | 4 | 5 | 6 | 6 |
>  +-----+---+---+---+---+---+---+---+...
>  | bbb | X | X | X |   | X | X | X |
>  |     | 1 | 2 | 3 | 3 | 5 | 6 | 7 |
>  +-----+---+---+---+---+---+---+---+...
>  | ccc | X |   | X | X | X |   | X |
>  |     | 1 | 1 | 3 | 4 | 5 | 5 | 7 |
>  +-----+---+---+---+---+---+---+---+...
>  | ddd | X | X |   |   | X |   |   |
>  |     | 1 | 2 | 2 | 2 | 5 | 5 | 5 |
>  +-----+---+---+---+---+---+---+---+...
>  | eee | X |   | X | X |   | X |   |
>  |     | 1 | 1 | 3 | 4 | 4 | 6 | 6 |
>  +-----+---+---+---+---+---+---+---+...
>
> L : Latest revision
> N : New revision
>
> In this table, we have had 7 updates, and 7 revsions. As all the 
> B-trees haven't been updated, we may have some 'holes' in the revision 
> associated to a B-tree. Typically, here, the 'aaa' B-tree will have 4 
> revisions (1, 4,
> 5 and 6) created, because this B-tree has been updated
> 4 times.
>
> Doing a read on revision 7 will use the following B-trees : aaa[6], 
> bbb[7], ccc[7], ddd[5], eee[6].
>
>
> All in all, a read is associated to a global transaction revision 
> which is used as a marker. Every B-tree update is done using the new revision.
>
> Managing on-going reads
> -----------------------
>
> We maintain all the B-tree revisions associated with a read, until the 
> read is completed, we need to get rid of the old revision when we are done.
> This is critical to keep the database size to a minimum. In order to 
> do that, we must know if an older revision is still in use. We then 
> need to keep a common data structure that is shared across threads, 
> which will hold all the used revisions, ordered. With the previous 
> example, assuming revisions 2, 4 and 6 are in use, we may use a linked list for that
purpose :
>
>    +---+   +---+   +---+
> o--| 2 |<--| 4 |<--| 6 |
>    +--2+   +--1+   +--3+
>
> When the read using rev 4 is done, we remove it from this list :
>           +-------------+
>    +---+  |    +---+    | +---+
> o--| 2 |<-+  X-| 4 |    +-| 6 |
>    +--2+       +--0+      +--3+
>
> The two remaining revisions are still in use, and one of them is older.
> We can clean the B-tree pages associated with revision 4 if there is 
> already a new revision for this B-tree, and for the pages that have 
> been copied.
>
> When the read using rev 2 is done, we can clean all the B-tree pages 
> that are associated with every revision below 6, as soon as there is a 
> B-tree revision above the oldest one.
>
> An other important point : each revision might be used more than once.
> Releasing a read will decrement the count, and when it goes down to 0, 
> we can safely remove the element from the list. We need a thread-safe 
> data structure for that.
>
> Managing the list
> -----------------
>
> let's go a bit deeper in the management of this list. It has 
> interesting characteristics :
>
> - elements are added by a unique thread : the writter
> - elements are not removed unless the count becomes 0
> - the count is equal to the maximum number of thread that can use the 
> element at some point
> - it's not because the counter is N that N thread *are* using this 
> element
> : we may eventually have no thread using it ! (but they *may* come 
> back
> later...)
> - each element must be be protected agains concurrent update (which is 
> a different requirement than protecting the full list)
> - any thread requiring an element in the list will always use the head 
> of the least (which is the most recent)
> - we don't do any list traversal ! Once a thread uses an element, it 
> remembers about its position
> - no user Thread is allowed to remove an element from this list. The 
> only thread that can do that is the Cleaner Thread
>
>
> The last characteristic is interesting : that means we can safely 
> depends on one single thread to do the cleaning, so there is no 
> concurrent access on teh list while deleting elements.
>
> The deletion can be done in two ways :
> - from the tail of the list, if the element's counter is equal to zero 
> (and we can iterate)
> - from the inner of the list, when the element's counter to be removed 
> is zero
>
> The first way is simpler, but it holds potentially a lot of element if 
> the oldest one is there for a long time The second way implies we can 
> relink elements in the list, ie this is a double-linked list
>
> Let's assume we go for the first way. In any case, the list will
> *always* have at least one element, even if its counter is zero :
>
>    head
>    +---+
>    | N | revision
>    +---+
>    | 0 | users counter (AtomicInteger)
>    +---+
>    |   |--o
>    +---+
> o--|   |
>    +---+
>
> A thread wanting to use revision N will simply increment the counter, 
> which is an AtomicInteger, guaranteing that we can't have a wrong 
> update of this value. The revision will never change.
> At this point, using the head is quite straight-forward.
>
> * Adding an element
>
> So the writter creates a new revision. This will create a new element :
>
>     new      head
>    +---+     +---+
>    |N+1|     | N |<<--- T1, T2, T5
>    +---+     +---+
>    | 0 |     | 3 |
>    +---+     +---+
>    |   |--o  |   |--o
>    +---+     +---+
> o--|   |  o--|   |
>    +---+     +---+
>
> The Head should always be pointing to one single cell, and any thread 
> should be able to get it. This is easy to do if we use an AtomicReference :
>
>     new      [head]
>    +---+     +---+
>    |N+1|     | N |<<--- T1, T2, T5
>    +---+     +---+
>    | 0 |     | 3 |
>    +---+     +---+
>    |   |--o  |   |--o
>    +---+     +---+
> o--|   |  o--|   |
>    +---+     +---+
>
> Swapping the head is done after we have updated the links, which is 
> done by the writter thread :
>
> Step 1 :
>
> Attach the head to the new element
>
>     new      [head]
>    +---+     +---+
>    |N+1|     | N |<<--- T1, T2, T5
>    +---+     +---+
>    | 0 |     | 3 |
>    +---+     +---+
>    |   |--o  |   |--o
>    +---+     +---+
> o--|   |<----|   |
>    +---+     +---+
>
> Step 2 :
>
> Attach the new element to the head (Here, at the same time, a new 
> thread needs the latest revison (still N))
>
>
>     new      [head]
>    +---+     +---+
>    |N+1|     | N |<<--- T1, T2, T5, T7
>    +---+     +---+
>    | 0 |     | 4 |
>    +---+     +---+
>    |   |---->|   |--o
>    +---+     +---+
> o--|   |<----|   |
>    +---+     +---+
>
> Step 3 :
>
> Swap the head
>
>    [head]
>    +---+     +---+
>    |N+1|     | N |<<--- T1, T2, T5, T7
>    +---+     +---+
>    | 0 |     | 4 |
>    +---+     +---+
>    |   |---->|   |--o
>    +---+     +---+
> o--|   |<----|   |
>    +---+     +---+
>
> Step 4 :
>
> A new thread need the latest revision
>
>
>            +--- T8
>    [head]  |
>    +---+   | +---+
>    |N+1|<<-+ | N |<<--- T1, T2, T5, T7
>    +---+     +---+
>    | 1 |     | 4 |
>    +---+     +---+
>    |   |---->|   |--o
>    +---+     +---+
> o--|   |<----|   |
>    +---+     +---+
>
> At this point, no other thread will *ever* be attached to revision N
>
> This can go on an on, with new revisions attached to the list by the 
> writter thread. Now, let's consider two different use case :
> 1) when the oldest revision has a counter going down to zero
> 2) when a revision which is not the head nor the oldest that is going 
> to zero
>
> First case, a first approach :
>
> This is the simplest. We have to remove the element from the lists, 
> and check if its parent's counter is also zero and is not the head. If 
> so, we iterate until we meet an element which counter is not zero and 
> is not the head.
>
> In any case, we simply have to change its parent's next pointer, and 
> to remove the element from the list :
>
>            +--- T8   +-- T1, T2, T5, T7
>    [head]  |         |
>    +---+   | +---+   | +---+     +---+
>    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9 This thread will release
> the element, setting teh counter to 0
>    +---+     +---+     +---+     +---+
>    | 1 |     | 4 |     | 0 |     | 0 |  Was 1
>    +---+     +---+     +---+     +---+
>    |   |---->|   |---->|   |---->|   |--o
>    +---+     +---+     +---+     +---+
> o--|   |<----|   |<----|   |<----|   |
>    +---+     +---+     +---+     +---+
>
> The thread that set the counter to 0
>
> Step 1 :
>
> Detach the last element parent's next pointer
>
>            +--- T8   +-- T1, T2, T5, T7
>    [head]  |         |
>    +---+   | +---+   | +---+     +---+
>    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9
>    +---+     +---+     +---+     +---+
>    | 1 |     | 4 |     | 0 |     | 0 |
>    +---+     +---+     +---+     +---+
>    |   |---->|   |---->|   |--o  |   |--o
>    +---+     +---+     +---+     +---+
> o--|   |<----|   |<----|   |<----|   |
>    +---+     +---+     +---+     +---+
>
> Step 2 :
> Iterate if the parent's counter is 0
>
>            +--- T8   +-- T1, T2, T5, T7
>    [head]  |         |
>    +---+   | +---+   | +---+     +---+
>    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9
>    +---+     +---+     +---+     +---+
>    | 1 |     | 4 |     | 0 |     | 0 |
>    +---+     +---+     +---+     +---+
>    |   |---->|   |--o  |   |--o  |   |--o
>    +---+     +---+     +---+     +---+
> o--|   |<----|   |<----|   |<----|   |
>    +---+     +---+     +---+     +---+
>
> Step 3 :
> Detach the last element which counter is 0 prev pointer
>
>            +--- T8   +-- T1, T2, T5, T7
>    [head]  |         |
>    +---+   | +---+   |     +---+     +---+
>    |N+1|<<-+ | N |<<-+     |N-1|     |N-2|x---- T9
>    +---+     +---+         +---+     +---+
>    | 1 |     | 4 |         | 0 |     | 0 |
>    +---+     +---+         +---+     +---+
>    |   |---->|   |--o      |   |--o  |   |--o
>    +---+     +---+         +---+     +---+
> o--|   |<----|   |      o--|   |<----|   |
>    +---+     +---+         +---+     +---+
>
> Step 4 :
>
>  We can now dispose the unlinked elements.
>
> What if a thread is freeing an element while we are already free 
> another one ? This can be exposed using the previous sample :
>
>            +--- T8   +-- T1, T2, T5, T7
>    [head]  |         |
>    +---+   | +---+   | +---+     +---+
>    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9 This thread will release
> the element, setting teh counter to 0
>    +---+     +---+     +---+     +---+
>    | 1 |     | 4 |     | 1 |     | 0 |  Was 1
>    +---+     +---+     +---+     +---+
>    |   |---->|   |---->|   |---->|   |--o
>    +---+     +---+     +---+     +---+
> o--|   |<----|   |<----|   |<----|   |
>    +---+     +---+     +---+     +---+
>
> Step 0+
>
> A thread (10) release the N-1 revison. We have two possibilities here :
>
> 1) The N-2 parent's next is not yet detached
>
>
>            +--- T8   +-- T1, T2, T5, T7
>    [head]  |         |
>    +---+   | +---+   | +---+     +---+
>    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9 This thread will release
> the element, setting teh counter to 0
>    +---+     +---+     +---+     +---+
>    | 1 |     | 4 |     | 0 |     | 0 |  Was 1
>    +---+     +---+     +---+     +---+
>    |   |---->|   |---->|   |---->|   |--o
>    +---+     +---+     +---+     +---+
> o--|   |<----|   |<----|   |<----|   |
>    +---+     +---+     +---+     +---+
>
> We just continue walking the list up to the first element with a non 
> zero counter
>
> 2) The N-2 paren'ts next is already detached
>
>
>
>            +--- T8   +-- T1, T2, T5, T7
>    [head]  |         |         +-- T10
>    +---+   | +---+   | +---+   | +---+
>    |N+1|<<-+ | N |<<-+ |N-1|x--+ |N-2|x---- T9 This thread will 
> release the element, setting teh counter to 0
>    +---+     +---+     +---+     +---+
>    | 1 |     | 4 |     | 0 |     | 0 |  Was 1
>    +---+     +---+     +---+     +---+
>    |   |---->|   |---->|   |--o  |   |--o
>    +---+     +---+     +---+     +---+
> o--|   |<----|   |<----|   |<----|   |
>    +---+     +---+     +---+     +---+
>
> In this case, we will have 2 threads dealing with two elements to be 
> cleaned up. This is not a problem, as T9 will now detach the N-2 
> element from teh N-1 one and T9 will detach the N-1 element from the 
> list. All is safe.
>
> About the removed elements
> --------------------------
>
> Once a thread has detached some elements which counters are equal to 
> 0, what should we do with them ? As we will update the disk with the 
> removed pages, we have to send them to teh Writter thread. This can be 
> done by pushing teminto a deque (and more specifically a
> ConcurrentLinkedDeque)
>
> Another option is that the writer thread handle the cleaning of this 
> queue when it is weaken up or periodically. In this case, we don't 
> need to push the element sinto a deque, and the reader threads will 
> just set the counter to 0.
>
> Second case
> -----------
>
> What if we accept the removal of elements from the middle of the list ?
> This is way more complex, as we have to be sure we don't break the chain.
> One way to get rid of this constraint is to enforce the writer thread 
> to do the job, by periodically checing the whole list (which shuld not 
> be too costly). In this case, the reader thread never update the 
> element's parent's next pointer, they just decrement teh counter.
> When this counter becomes 0, then we can remove the element.
>
> Solution
> ========
>
> We will separate the threads into 2 categories :
> - readers (we may have many)
> - writer (we have only one)
>
> The readers always peek the latest revision (which is the head of the 
> list). It then increment the counter. When teh thread does not need 
> the revision anymore, the counter is decremented. This is the only 
> action the reader thread will ever do on the elements and on the list, 
> but it will always keep a reference to the element into the session.
>
> The writer thread has a lot more to do :
> - it is responsible for adding element at the top of the list (see upper).
> Once the head is swapped, every new reader will get the added element 
> as a reference.
> - it also has to check in the list for the elements which counter is 
> 0, if teh list is longer than 1 element. This check can be done after 
> each update, or periodically (every N seconds if the number of 
> elements is > 1, or after M additions) if tht help saving updates on disk.
>
> Referencing pages
> -----------------
>
> All the B-tree revisions are kept into the BoB B-tree, so it's easy to 
> retrieve the rootPage for a given revision.
>
> As all the pages can't be hold in memory, we sometime have to reach a 
> page from disk or from the cache. This is done following this simple algorithm :
>
>   for a given page with offset O
>     if the cache contains an instance for O
>       return the page
>     else
>       fetch the page with offset O from disk
>       store it in the cache
>
> In our case, we just add a layer, which will check in the working 
> memory for the instance of a page, given its offset.
>
>
>


--
Ashma Shrestha
PHP Developer
(770) 310-9456
ashma.shrestha@gmail.com
Mime
View raw message