uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Annotation (Indexing) a bottleneck in UIMA in terms of speed
Date Thu, 26 Jun 2008 17:23:20 GMT
Great minds think alike :-)

LeHouillier, Frank D. wrote:
> To test your theory that it is the writing of Annotations to the CAS
> that is taking so long I ran an annotator with this code: 
> 
> public class TestAnnotator extends JCasAnnotator_ImplBase {
> 
> 	@Override
> 	public void process(JCas arg0) throws
> AnalysisEngineProcessException {
> 
> 		int i = 0;
> 		
> 		while (i < 100000)
> 		{
> 			Annotation a = new Annotation(arg0);
> 			
> 			a.setBegin(1);
> 			a.setEnd(2);
> 			a.addToIndexes();
> 			
> 			i++;
> 		}
> 		
> 		System.out.println("Done");
> 
> 	}
> 
> } 
> 
> This takes less than two seconds to run on my laptop.  Is it possible
> your bottleneck isn't where you think it is?
> 
> -----Original Message-----
> From: rohan rai [mailto:hirohanin@gmail.com] 
> Sent: Thursday, June 26, 2008 12:04 PM
> To: uima-user@incubator.apache.org
> Subject: Re: Annotation (Indexing) a bottleneck in UIMA in terms of
> speed
> 
> @Pascal: As I have already said the timing does not scale linearly
>               Secondly it the approx times which I have specified
> @Frank:
>      I was talking about actual adding of annotation to CAS
>     Record refer to lets say in tags like these <a>.....</a>
>     and the document consist of such record
>     Annotation is done via this method
>                                MyType annotation = new MyType(jCas);
>                                annotation.setBegin(start);
>                                annotation.setEnd(end);
>                                annotation.addToIndexes();
>    This takes a lot of time which is not likeable.
> 
> Regards
> Rohan
> 
> 
> On Thu, Jun 26, 2008 at 8:15 PM, LeHouiloes lier, Frank D. <
> Frank.LeHouillier@gd-ais.com> wrote:
> 
>> Just to clarify, what do you mean by "annotation"?  Is there a 
>> specific Analysis Engine that you are using? What is a "record"? Is 
>> this a document?  It would actually be surprizing for many 
>> applications if annotation were not the bottleneck, given that some 
>> annotation processes are quite expensive, but this doesn't seem like 
>> what you mean here. I can't tell from your question whether it is the 
>> process that determines the annotations that is a burden or the actual
> 
>> adding of the annotations to the cas.
>>
>> -----Original Message-----
>> From: rohan rai [mailto:hirohanin@gmail.com]
>> Sent: Thursday, June 26, 2008 7:36 AM
>> To: uima-user@incubator.apache.org
>> Subject: Annotation (Indexing) a bottleneck in UIMA in terms of speed
>>
>> When I profile a UIMA application
>> What I see that annonation takes a lot of time If I profile I see that
> 
>> to annotate 1 record , it takes around 0.06 seconds Now you may say 
>> its good Now scale up Although it does not scale up linearly. But here
> 
>> is rough estimate on experiments done 6000 records take 6 min to 
>> annotate 800000 record tale around 10 hrs min to annotate Which is
> bad.
>> One thing is that I am treating each record individually as a cas Even
> 
>> if I treat all the record as a single cas it takes around 6-7 hrs 
>> Which is still not good in terms of speed
>>
>> Is there a way out?
>> Can I improve performance by any means??
>>
>> Regards
>> Rohan
>>

Mime
View raw message