uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Tewatia <arun.tewa...@orkash.com>
Subject Re: remoteAnalysisEngine services not scaling to effect
Date Wed, 28 Sep 2011 12:54:45 GMT
Thanks Greg Holmberg and Burn Lewis for the reply....

You have understood right, what i am trying to do.

> What you're doing is taking each step in your analysis engine and running it on
> one or more machines.

And yes this will create the 2 problems that you mentioned.
Network overhead & lumpy behavior

But then as  'Burn Lewis' mentioned it shows a disadvantage when some of the
annotators in the pipeline consumes lot of memory. Also almost all of the
documents are of same size. 

And similar is my case, some of annotators of my pipeline consume lot of memory.
So what i am trying to do is, club together a few annotators i.e. divide the
whole pipeline ( having about 15 AE's ) into 2-3 aggregates.

So now i can maintain the ratios btw these aggregates.
In first stage i am trying to optimize performance by maintaining this ratio. In
second stage I'll use cas multipliers to slice the documents.

As for my problem of " asynchronous data in database " , it still persists.
I enabled FINE-logging as suggested by Burn Lewis .
I also observed the queue depths of CasConsumers queue, which didn't budge from
zero. So i understand that there's no point of increasing instances of cas
consumers, but if i did so........ still the data should go syncronized.
Shouldn't it ?

I observed from the logs that cases are divided among the the 2 running
instances of cas consumers , but some of the cases seem to be missed out, which
didn't go to any of the 2 instances. I can't understand why so ?

Arun Tewatia


View raw message