uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LeHouillier, Frank D." <Frank.LeHouill...@gd-ais.com>
Subject RE: Annotation (Indexing) a bottleneck in UIMA in terms of speed
Date Thu, 26 Jun 2008 17:19:53 GMT
To test your theory that it is the writing of Annotations to the CAS
that is taking so long I ran an annotator with this code: 

public class TestAnnotator extends JCasAnnotator_ImplBase {

	@Override
	public void process(JCas arg0) throws
AnalysisEngineProcessException {

		int i = 0;
		
		while (i < 100000)
		{
			Annotation a = new Annotation(arg0);
			
			a.setBegin(1);
			a.setEnd(2);
			a.addToIndexes();
			
			i++;
		}
		
		System.out.println("Done");

	}

} 

This takes less than two seconds to run on my laptop.  Is it possible
your bottleneck isn't where you think it is?

-----Original Message-----
From: rohan rai [mailto:hirohanin@gmail.com] 
Sent: Thursday, June 26, 2008 12:04 PM
To: uima-user@incubator.apache.org
Subject: Re: Annotation (Indexing) a bottleneck in UIMA in terms of
speed

@Pascal: As I have already said the timing does not scale linearly
              Secondly it the approx times which I have specified
@Frank:
     I was talking about actual adding of annotation to CAS
    Record refer to lets say in tags like these <a>.....</a>
    and the document consist of such record
    Annotation is done via this method
                               MyType annotation = new MyType(jCas);
                               annotation.setBegin(start);
                               annotation.setEnd(end);
                               annotation.addToIndexes();
   This takes a lot of time which is not likeable.

Regards
Rohan


On Thu, Jun 26, 2008 at 8:15 PM, LeHouiloes lier, Frank D. <
Frank.LeHouillier@gd-ais.com> wrote:

> Just to clarify, what do you mean by "annotation"?  Is there a 
> specific Analysis Engine that you are using? What is a "record"? Is 
> this a document?  It would actually be surprizing for many 
> applications if annotation were not the bottleneck, given that some 
> annotation processes are quite expensive, but this doesn't seem like 
> what you mean here. I can't tell from your question whether it is the 
> process that determines the annotations that is a burden or the actual

> adding of the annotations to the cas.
>
> -----Original Message-----
> From: rohan rai [mailto:hirohanin@gmail.com]
> Sent: Thursday, June 26, 2008 7:36 AM
> To: uima-user@incubator.apache.org
> Subject: Annotation (Indexing) a bottleneck in UIMA in terms of speed
>
> When I profile a UIMA application
> What I see that annonation takes a lot of time If I profile I see that

> to annotate 1 record , it takes around 0.06 seconds Now you may say 
> its good Now scale up Although it does not scale up linearly. But here

> is rough estimate on experiments done 6000 records take 6 min to 
> annotate 800000 record tale around 10 hrs min to annotate Which is
bad.
> One thing is that I am treating each record individually as a cas Even

> if I treat all the record as a single cas it takes around 6-7 hrs 
> Which is still not good in terms of speed
>
> Is there a way out?
> Can I improve performance by any means??
>
> Regards
> Rohan
>

Mime
View raw message