uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burn Lewis <burnle...@gmail.com>
Subject Re: Providing collection definitions to multiple CasMulipliers
Date Wed, 17 Oct 2012 12:28:02 GMT
On Tue, Oct 16, 2012 at 6:47 PM, Simon <simon@teratext.saic.com.au> wrote:

>
>
> I was hoping to deploy the same Aggregate many times and each Aggregate
> process files from a different directory. But I was wondering how to tell
> each Aggregate which directory to process, and not use config files to do
> this.
>
If each aggregate is reading from a different directory then they're not
really the same aggregate, so should be on different queues.

>
> Each Aggregate would have the same input queue, so it seems providing the
> directory path (in an input CAS) to a newly deployed aggregate is not
> possible since there may be already deployed identical aggregates with
> the same input queue.
>
I was guessing here that your work item was a directory and a bunch of
files in it, so any of the aggregates on the queue could do the work.
Having multiple aggregates on the same queue speeds things up as each can
be working on different pieces of work at the same time. Note that each
work item (CAS) is processed by only one of the aggregates.  Depending on
your work-load balance, dedicating each aggregate to a single directory
might be less efficient than allowing all aggregates to process any
directory.

>
> Thanks
> Simon
>
>
> If my guesses are off the mark please clarify with a small example of
files and directories.

~Burn

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message