uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Suppe <ssu...@llnl.gov>
Subject RE: Server Socket Timeout Woes
Date Wed, 30 Apr 2008 16:48:56 GMT
Thanks for the kind words - it's all been out of necessity, not some grand 
scheme!  I too have thought about the balance load aspects of the 'job 
scheduler.'  Even without the ability to add/subtract additional resources 
(a nice feature), it seems that the current setup is missing some other 
niceties as well.

I find that the in order for all of our nodes to be used, I have to 
'overshoot' the number of instances I'd really like to process.  This is 
because if, say, I had 10 worker nodes, and I started 10 instances, there's 
a good chance some of them will get 2 instances per worker, or more, while 
others would get 0.  So I oversaturate the lines and hope for the best.

I think, as had been said in this thread, perhaps the best bet would be to 
allow a thread to get a resource simply for the length of a single 
processCas(), then release it back to the pool.  I suppose there are some 
overhead issues with this?  But at least you wouldn't worry about wasting 
so many threads all of the time.  Maybe a few different options, such as 
the current setup, a new thread per processCas, and maybe a way to gain 
priority?  So if you're constantly "checking out" the same type of thread, 
you're allowed to hold on to a longer "lease" of that thread, and overhead 
time goes down?  Something like DHCP, but for worker threads :)  Of course, 
that might be too complicated and not worth the effort.

It seems like taking a resource just long enough to perform one block of 
work (one processCas) is the simplest and most 'tried-and-true' 
form.  However, at least in most of our work, each processCas is really 
pretty quick, so it would look like a lot of overhead for switching threads 
around all of the time.  Of course 'pretty quick' is relative, and in 
computer-time is closer to an eternity.  But we're averaging 100s to 1000s 
of documents per second, so if we're ALWAYS setting up and tearing down, 
that could eat into out efficiency.

These are just some of my thoughts, anyone have any ideas?


At 10:22 AM 4/29/2008, you wrote:
>I'm excited to see this thread for it's affirmation that someone has 
>pushed Vinci scalability to the point that Steve has at LLNL.  Also, to 
>know the currently released version has some limitations.  At the risk of 
>diverting this thread, let me share what we've found.
>I'm on board with Adam's line of thinking.  We've just spent 2 weeks 
>experimenting with the various options for exclusive/random allocation of 
>Vinci services, finding that 'exclusive' is the most reliable way to 
>balance load (random sometimes hands all of the clients the same service 
>while other services go unused).  The phrase "when a service is needed" 
>isn't clear in the documentation.  As Adam indicated, our finding is that 
>"need" occurs only at client thread initialization time as opposed to each 
>process(CAS) call.  Additionally, "exclusive" is not exactly clear, as two 
>client threads can be handed the same service if the number of services 
>available are less than the number of threads initializing.  This behavior 
>is robust (better to get a remote than have nothing allocated), but it 
>isn't clear from our relatively small setup (two threads, two remotes) 
>what the word 'exclusive' means or how large a system can get before 
>'exclusive' pans out as the right/wrong approach.
>In the face of services starting/stopping on remote computers (e.g., 
>during multi-platform reboot), there seems to be no way to robustly take 
>advantage of additional services coming on-line.  If "when needed" meant 
>each process(CAS) call (as an option at least ... to trade the re-connect 
>negotiation overhead for dynamic scalability), then a system that 
>initializes to 5 remotes can balance out as 10,20,30 remotes come 
>online.  For now, we are using the CPE 'numToProcess' parameter to exit 
>the CPE, then construct a new CPE and re-enter the process() routine to 
>seek out new services periodically.
>Also, we are seeing a startup sequence that sometimes results in the first 
>document sent to each remote returning immediately with a 
>connection/timeout exception ... so we catch those items and re-submit 
>them at the end of the queue in case they really did exit due to a valid 
>timeout exception.
>Any feedback/collaboration would be appreciated.
>- Charles
> > Date: Wed, 23 Apr 2008 17:44:50 -0400> From: alally@alum.rpi.edu> To: 
> uima-user@incubator.apache.org> Subject: Re: Server Socket Timeout 
> Woes> > On Wed, Apr 23, 2008 at 4:39 PM, Steve Suppe <ssuppe@llnl.gov> 
> wrote:> > Hello again,> >> > I think you are 100% right here. I managed

> to roll back to my patched> > version of UIMA 2.1.0. In this one, I 
> implemented the pool of threads as> > automatically expandable. This 
> seemed to solve all of our problems, and> > things are chugging away very 
> happily now.> >> > I know this is the user group, but is this something I

> should look to> > contributing somehow?> >> > Definitely - you could
> a JIRA issue and attach a patch. We> should probably think a bit about 
> how this thread pool was supposed to> work, though. My first thought is 
> that the clients would round-robin> over the available threads, and each 
> thread would be used for only one> request and would then be relinquished 
> back into the pool. But> instead, it looks like the client holds onto a 
> thread for the entire> time that client is connected, which doesn't make 
> a whole lot of> sense. If the thread pool worked in a more sensible way, 
> it might not> need to be expandable.> > -Adam
>Back to work after baby­how do you know when you’re ready?


View raw message