incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: assistance with dictionary lookup issue
Date Tue, 05 Feb 2013 14:44:55 GMT

Looks good to me, with one question.

Instead of getting an iterator and then building a new list, can we just skip getting the
iterator and use the list that selectCovered returns?

I will mock up a diff here of what I mean:
-	Iterator btaItr = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering).iterator();
-	while (btaItr.hasNext())
-		{
- 			BaseToken bta = (BaseToken) btaItr.next();
-				ltList.add(lt);
- 			}
- 		}

+	ltList = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering);
	
	return ltList;

I know you said it was quick and dirty at the moment - my 2 cents - unless someone comes up
with a better engineered solution, I think we could add the new method (with a name like getLookupTokens)
and leave the old one so we don't have to deprecate anything. And phase in the change to the
various *LookupInitializerImpl classes if needed.

-- James


> -----Original Message-----
> From: ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Masanz, James J.
> Sent: Monday, February 04, 2013 4:01 PM
> To: ctakes-dev@incubator.apache.org
> Subject: RE: assistance with dictionary lookup issue
> 
> I'll take a look at the patch. Also be aware of
> https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of
> enhancing performance  -- if willing to assume annotations (BaseTokens
> currently) are sorted. Currently it's always BaseToken and always sorted,
> just not sure if we want to code to that assumption.
> 
> ________________________________________
> From: ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org
> [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on
> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> Sent: Monday, February 04, 2013 3:43 PM
> To: ctakes-dev@incubator.apache.org
> Subject: assistance with dictionary lookup issue
> 
> Pei helped me track down an issue with performance I'd noticed in the
> dictionary annotator, and I have filed the issue here:
> https://issues.apache.org/jira/browse/CTAKES-143
> 
> I implemented a quick and dirty proof of concept fix and noticed dramatic
> performance improvement.  I attached the patch to the issue, but it
> involves changing an interface (currently does not try to fix other
> implementing classes so obviously not ready for primetime), so I wanted to
> solicit the list first in case anyone with better knowledge of that module
> has some better engineering ideas than what I came up with.
> 
> Thanks,
> 
> --
> Tim Miller, PhD
> Postdoctoral Research Fellow
> Children's Hospital Informatics Program
> Children's Hospital Boston and Harvard Medical School
> 617-919-1223

Mime
View raw message