uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "reshu.agarwal" <reshu.agar...@orkash.com>
Subject Re: DUCC- process_dd
Date Fri, 01 May 2015 04:31:24 GMT
Eddie,

I was using this same scenario and doing hit and try to compare this 
with UIMA AS to get the more scaled pipeline as I think UIMA AS can also 
did this. But I am unable to touch the processing time of DUCC's default 
configuration like you mentioned with UIMA AS.

Can you help me in doing this? I just want to do scaling by using best 
configuration of UIMA AS and DUCC which can be done using process_dd. 
But How??

Thanks in advanced.

Reshu.

On 05/01/2015 03:28 AM, Eddie Epstein wrote:
> The simplest way of vertically scaling a Job process is to specify the
> analysis pipeline using core UIMA descriptors and then using
> --process_thread_count to specify how many copies of the pipeline to
> deploy, each in a different thread. No use of UIMA-AS at all. Please check
> out the "Raw Text Processing" sample application that comes with DUCC.
>
> On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal <reshu.agarwal@orkash.com>
> wrote:
>
>> Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
>> AEs both.
>>
>> I want to scale aggregate as well as individual AEs. Is there any way of
>> doing this in UIMA AS/DUCC?
>>
>>
>>
>> On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:
>>
>>> In async aggregate you scale individual AEs not the aggregate as a whole.
>>> The below configuration should do that. Are there any warnings from
>>> dd2spring at startup with your configuration?
>>>
>>> <analysisEngine async="true" >
>>>
>>>                                   <delegates>
>>>                                           <analysisEngine
>>> key="ChunkerDescriptor">
>>>                                                   <scaleout
>>> numberOfInstances="5" />
>>>                                           </analysisEngine>
>>>                                           <analysisEngine
>>> key="NEDescriptor">
>>>                                                   <scaleout
>>> numberOfInstances="5" />
>>>                                           </analysisEngine>
>>>                                           <analysisEngine
>>> key="StemmerDescriptor">
>>>                                                   <scaleout
>>> numberOfInstances="5" />
>>>                                           </analysisEngine>
>>>                                           <analysisEngine
>>> key="ConsumerDescriptor">
>>>                                                   <scaleout
>>> numberOfInstances="5" />
>>>                                           </analysisEngine>
>>>                                   </delegates>
>>>                           </analysisEngine>
>>>
>>> Jerry
>>>
>>> On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <reshu.agarwal@orkash.com>
>>> wrote:
>>>
>>>   Hi,
>>>> I was trying to scale my processing pipeline to be run in DUCC
>>>> environment
>>>> with uima as process_dd. If I was trying to scale using the below given
>>>> configuration, the threads started were not as expected:
>>>>
>>>>
>>>> <analysisEngineDeploymentDescription
>>>>           xmlns="http://uima.apache.org/resourceSpecifier">
>>>>
>>>>           <name>Uima v3 Deployment Descripter</name>
>>>>           <description>Deploys Uima v3 Aggregate AE using the Advanced
>>>> Fixed
>>>> Flow
>>>>                   Controller</description>
>>>>
>>>>           <deployment protocol="jms" provider="activemq">
>>>>                   <casPool numberOfCASes="5" />
>>>>                   <service>
>>>>                           <inputQueue endpoint="UIMA_Queue_test"
>>>> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" />
>>>>                           <topDescriptor>
>>>>                                   <import
>>>>
>>>> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
>>>> />
>>>>                           </topDescriptor>
>>>>                           <analysisEngine async="true"
>>>> key="FlowControllerAgg" internalReplyQueueScaleout="10"
>>>> inputQueueScaleout="10">
>>>>                                   <scaleout numberOfInstances="5"/>
>>>>                                   <delegates>
>>>>                                           <analysisEngine
>>>> key="ChunkerDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                           <analysisEngine
>>>> key="NEDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                           <analysisEngine
>>>> key="StemmerDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                           <analysisEngine
>>>> key="ConsumerDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                   </delegates>
>>>>                           </analysisEngine>
>>>>                   </service>
>>>>           </deployment>
>>>>
>>>> </analysisEngineDeploymentDescription>
>>>>
>>>>
>>>> There should be 5 threads of FlowControllerAgg where each thread will
>>>> have
>>>> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor
>>>> and
>>>> ConsumerDescriptor.
>>>>
>>>> But I didn't think it is actually happening in case of DUCC.
>>>>
>>>> Thanks in advance.
>>>>
>>>> Reshu.
>>>>
>>>>
>>>>
>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message