directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <aok...@bellsouth.net>
Subject [seda] Race condition between disconnect and output events
Date Thu, 16 Sep 2004 14:56:31 GMT
On Thu, 2004-09-16 at 04:38, jira@apache.org wrote:
> The following comment has been added to this issue:
> 
>      Author: Trustin Lee
>     Created: Thu, 16 Sep 2004 1:37 AM
>        Body:

> If it is guarenteed that all Subscribers knows the list
> of managed channels, we will be able to resolve the 

For the record (i forgot) the first case so here it is:

1). the channel was not put into the output manager before first output
event was processed

Are you suggesting that there be a single list of channels which all
Subscribers can see?  Right now only the Input/OutputManagers have
access to the channel.  I'm just trying to understand how this will make
sure the OutputEvent is processed after the ConnectEvent. 

> first case.  Subscribing and unsubscribing rarely happens, 
> so it should be okay.  The performance problem exists 
> when the connections are very ofen established and 
> closed soon.

Right that's when you need the synchronization constructs which are
expensive.

> Using a priority queue will be helpful here.  This solution brings 
> up another synchronization issue that can slow down overall 
> performance per channel although the synchronization block (getting nextval) 
> is small enough.  Plus a customized high-performance priority queue 
> implementation is required.  Each event types will have their own 
> priorities and the sequence of an event will be of the second priority 
> which can be disabled by user.  This solution will solve issue 
> DIRSEDA-5, too.

Ok bare with me.  What you suggest is a priority queue based on two
different kinds of priorities.  The first kind is an Event type priority
and the second kind is a priority that orders events of the same type. 
Is this correct?

> The second case is more complicated.  In worst case, we can receive 
> output event one or two seconds later disconnection event is arrived.  
> There is no easy solution because EventRouter cannot predict there will 
> be more output events which is scheduled.  Notifying the user that the 
> event was not processed due to unexpected disconnection would be the 
> best we can do; the user will choose whether to retry it later or just 
> to drop it.

Here's the second case for continuity's sake:

2). the channel was removed from the output manager before all the
output events could be processed

Yes this is a bit more complex.  Obviously there is nothing you can do
if the client falls off the face of the earth.  Even if you had the
referrence to the channel it does no good to write to a closed channel. 
Something's going to fail regardless of what we do.  

What perplexes me about this situation is the fact that the client is
synchronous in the connect->write->read->disconnect sequence.  Here's
the client code out of the test case:

   1     EchoTCPClient client = new EchoTCPClient();
   2     client.connect( "localhost", 7 );
   3     byte[] toSend = "Hello world!".getBytes();
   4     byte[] recieved = new byte[toSend.length];
   5     client.getOutputStream().write( toSend );
   6     client.getInputStream().read( recieved );
   7     client.disconnect();
   8     assertEquals( new String( toSend ), new String( recieved ) );

So in lines 6 & 7 the client must read all the input before a disconnect
occurs to trigger a Disconnect event.  The server is not going to
disconnect unless there is a specific protocol message that triggers
that like an LDAP UnbindRequest - here we have nothing like that.

The question then is how the heck is a DisconnectEvent outrunning an
OutputEvent when all OutputEvents should have been processed already
before the DisconnectEvent is even created?  Can you see the ugliness of
staged event driven archs when it comes to debugging them.

BTW the fact that this client is synchronous read first then a
disconnect does not mean case 2 will be out of the question every time. 
Other protocols can still have a DisconnectEvent outrun other
OutputEvents as in LDAP with the UnbindRequest.  Other requests like
SearchRequests whose responses are being processed can be terminated by
a disconnect due to an UnbindRequest.

I guess this is more food for thought.  I still want to think about this
priority queue approach.  I guess the PQ works if you have recieved all
the events you need to order when you're looking at it to dequeue.  If
some events just have not arrived yet but should be the next to be
processed then the PQ I'm afraid will fail us.  We need something more
is what I'm thinking.  Something where there is centralized accounting
going on for events and those other events they generate.  This way
stages can determine event processing order and even use Barriers to
synchronize or join multiple threads across stages.  This however scares
me because of the cost to synchronize.

Alex

> ---------------------------------------------------------------------
> View this comment:
>   http://issues.apache.org/jira/browse/DIRSEDA-6?page=comments#action_53128
> 
> ---------------------------------------------------------------------
> View the issue:
>   http://issues.apache.org/jira/browse/DIRSEDA-6
> 
> Here is an overview of the issue:
> ---------------------------------------------------------------------
>         Key: DIRSEDA-6
>     Summary: Race condition between disconnect and output events
>        Type: Bug
> 
>      Status: Open
>    Priority: Major
> 
>     Project: Seda Framework
> 
>    Assignee: Alex Karasulu
>    Reporter: Alex Karasulu
> 
>     Created: Thu, 16 Sep 2004 12:23 AM
>     Updated: Thu, 16 Sep 2004 1:37 AM
> 
> Description:
> On occasion I get the following failure from the echo server test:
> 
> -- o error message o --
> 
> Sep 16, 2004 2:50:52 AM org.apache.seda.output.LoggingOutputMonitor channelMissing WARNING:
org.apache.seda.output.DefaultOutputManager@2d9c06 could not find channel for client 127.0.0.1:7<-127.0.0.1:2402
> 
> -- o error message o --
> 
> Now this means a channel for the client was expected in the output manager but was not
found.  This can be caused by two possible conditions:
> 
>  1). the channel was not put into the output manager before first output event was processed
>  2). the channel was removed from the output manager before all the output events could
be processed
> 
> In the first case we have a race condition between the thread processing a ConnectEvent
and a thread processing an OutputEvent.  The ConnectEvent processing is really slow in this
case because all the stages were traversed via input->decode->reqproc->output before
the ConnectEvent was handled.  That's a little far fetched so I'm going to presume that the
second case is more likely.
> 
> In the second case the race condition is between the thread processing a DisconnectEvent
and a thread processing an OutputEvent.  The DisconnectEvent in this case is outrunning the
processing of the OutputEvent.  Before the OutputEvent can flush out data to the client the
channel to the client is removed from the output manager by the DisconnectEvent.
> 
> 
> 
> ---------------------------------------------------------------------
> JIRA INFORMATION:
> This message is automatically generated by JIRA.
> 
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> 
> If you want more information on JIRA, or have a bug to report see:
>    http://www.atlassian.com/software/jira
> 


Mime
View raw message