uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LeHouillier, Frank D." <Frank.LeHouill...@gd-ais.com>
Subject RE: Annotation (Indexing) a bottleneck in UIMA in terms of speed
Date Thu, 26 Jun 2008 17:19:53 GMT
To test your theory that it is the writing of Annotations to the CAS
that is taking so long I ran an annotator with this code: 

public class TestAnnotator extends JCasAnnotator_ImplBase {

	public void process(JCas arg0) throws
AnalysisEngineProcessException {

		int i = 0;
		while (i < 100000)
			Annotation a = new Annotation(arg0);



This takes less than two seconds to run on my laptop.  Is it possible
your bottleneck isn't where you think it is?

-----Original Message-----
From: rohan rai [mailto:hirohanin@gmail.com] 
Sent: Thursday, June 26, 2008 12:04 PM
To: uima-user@incubator.apache.org
Subject: Re: Annotation (Indexing) a bottleneck in UIMA in terms of

@Pascal: As I have already said the timing does not scale linearly
              Secondly it the approx times which I have specified
     I was talking about actual adding of annotation to CAS
    Record refer to lets say in tags like these <a>.....</a>
    and the document consist of such record
    Annotation is done via this method
                               MyType annotation = new MyType(jCas);
   This takes a lot of time which is not likeable.


On Thu, Jun 26, 2008 at 8:15 PM, LeHouiloes lier, Frank D. <
Frank.LeHouillier@gd-ais.com> wrote:

> Just to clarify, what do you mean by "annotation"?  Is there a 
> specific Analysis Engine that you are using? What is a "record"? Is 
> this a document?  It would actually be surprizing for many 
> applications if annotation were not the bottleneck, given that some 
> annotation processes are quite expensive, but this doesn't seem like 
> what you mean here. I can't tell from your question whether it is the 
> process that determines the annotations that is a burden or the actual

> adding of the annotations to the cas.
> -----Original Message-----
> From: rohan rai [mailto:hirohanin@gmail.com]
> Sent: Thursday, June 26, 2008 7:36 AM
> To: uima-user@incubator.apache.org
> Subject: Annotation (Indexing) a bottleneck in UIMA in terms of speed
> When I profile a UIMA application
> What I see that annonation takes a lot of time If I profile I see that

> to annotate 1 record , it takes around 0.06 seconds Now you may say 
> its good Now scale up Although it does not scale up linearly. But here

> is rough estimate on experiments done 6000 records take 6 min to 
> annotate 800000 record tale around 10 hrs min to annotate Which is
> One thing is that I am treating each record individually as a cas Even

> if I treat all the record as a single cas it takes around 6-7 hrs 
> Which is still not good in terms of speed
> Is there a way out?
> Can I improve performance by any means??
> Regards
> Rohan

View raw message