river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Moynihan <r...@calicojack.co.uk>
Subject Re: Concurrency and River
Date Tue, 02 Oct 2007 10:13:47 GMT
Dan Creswell wrote:
> I've yet to see exactly how Erlang does failure detection of processes.
>  I guess there might be some timeout value somewhere in respect of
> messages reaching a destination etc but I've not seen a description of
> this aspect of Erlang.

Failure detection in Erlang appears to occur, less at the message level 
than at the process level.  All processes whether local or remote are 
typically seen as being unreliable.

> Further whilst Erlang might do failure detection (of a form) solving the
> issues of failure are the difficult bit and I'm less convinced Erlang
> offers much here.  For example, one solution to failure is replication
> and it appears you are (unsurprisingly) left to do that for yourself
> right now.  Putting my high-performance hat on I'd also point out that
> replication has recognized limits especially when it's done with
> transactions which leads to even more esoteric solutions that are
> largely about appropriate architecture/interactions and less about
> shared-nothing or message passing.

Erlang/OTP and failure handling are a *BIG* thing in Erlang, and are 
touted as one of it's main strengths along with concurrency.  Again, I'd 
reiterate I don't have much practical experience with Erlang, but it 
does seem to have a lot to offer in this regard.

Erlang/OTP provide a notion of supervisor processes and supervision 
trees.  Where failure/crashing of a process is detected by it's 
supervisor who can then handle the failure appropriately (usually by 
logging the error, or restarting the process according to a particular 
restart strategy).  Hand in hand with this is Erlang's dynamic code 
updates, which mean that when part of a system crashes or fails you can 
fix the error and deploy the fix to the live system, restarting *JUST* 
the process that failed in a graceful manner.

It's claimed that these properties have led to Erlang systems being 
created with 99.9999999% (9 9's) reliability.

Rick Moynihan
Software Engineer
Calico Jack LTD

View raw message