ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: CTAKES-248- include original covered text of NEs which can't be recovered post if NE is from a disjoint span
Date Wed, 02 Oct 2013 14:21:28 GMT
+1 to have a pointer back to the BaseToken(s) rather than a | String (so we could get back
the spans and other info if needed).
I think the atom will be slightly different, take for example:
Perhaps with an example:
Sentence/LookupWindow: "alcoholic liver disease was acute."
originalText: "disease acute" [New feature to store the Tokens that were matched due to the
permutations?]
UmlsConcept.cui: C0001314
UmlsConcept.preferredText: "Acute Disease" [New feature to store the atom/text returned by
the UMLS CUI]

I also ran into a similar case where I wish IdentifiedAnnotation.segmentID/SentenceID was
the actual Segment type and not a String.

This is just my 2 cents... open to ideas though.
--Pei


> -----Original Message-----
> From: Richard Eckart de Castilho [mailto:richard.eckart@gmail.com]
> Sent: Wednesday, October 02, 2013 3:19 AM
> To: dev@ctakes.apache.org
> Subject: Re: CTAKES-248- include original covered text of NEs which can't be
> recovered post if NE is from a disjoint span
> 
> What benefit would it have to store a string with some separation character
> (which may mean that the separation character in the elements may need to
> be escaped), over using a feature of type FSArray<Token> pointing to the
> original segments?
> 
> Not sure if that is what Karthik meant when referring to fetching the
> matched atom.
> 
> -- Richard
> 
> On 02.10.2013, at 01:46, Karthik Sarma <ksarma@ksarma.com> wrote:
> 
> > Hmm, couldn't you just fetch the matched atom and use that? Should be
> > the same information (without, I suppose, the original ordering and split).
> >
> > --
> > Karthik Sarma
> > UCLA Medical Scientist Training Program Class of 20??
> > Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation
> > to the House of Delegates of the American Medical Association
> > ksarma@ksarma.com
> > gchat: ksarma@gmail.com
> > linkedin: www.linkedin.com/in/ksarma
> >
> >
> > On Tue, Oct 1, 2013 at 12:37 PM, Masanz, James J.
> <Masanz.James@mayo.edu>wrote:
> >
> >> Yes, this would help address that multiple permutations example.  The
> >> new getOriginalText method would return something like
> >> "Acute|Disease".  Right now I'm thinking of just using vertical bar
> >> as delimiter, to start with at least, but think it should be configurable.
> >>
> >> -----Original Message-----
> >> From: dev-return-2067-Masanz.James=mayo.edu@ctakes.apache.org
> [mailto:
> >> dev-return-2067-Masanz.James=mayo.edu@ctakes.apache.org] On
> Behalf Of
> >> Chen, Pei
> >> Sent: Tuesday, October 01, 2013 9:38 AM
> >> To: dev@ctakes.apache.org
> >> Subject: CTAKES-248- include original covered text of NEs which can't
> >> be recovered post if NE is from a disjoint span
> >>
> >> This sounds pretty cool.
> >> James, will this address the multiple permutations lookup example:
> >> "Acute alcoholic liver disease."  There is a cui: C0001314: Acute
> >> Disease, but if you getCoveredText(), on the UMLSConcept, you would
> >> actually get the same "Acute alcoholic liver disease" instead of "Acute
> Disease".
> >> So, there is a new field called getOriginalText() that matched the hit?
> >>
> >>> -----Original Message-----
> >>> From: james-masanz@apache.org [mailto:james-masanz@apache.org]
> >>> Sent: Monday, September 30, 2013 5:49 PM
> >>> To: commits@ctakes.apache.org
> >>> Subject: svn commit: r1527792 - /ctakes/trunk/ctakes-type-
> >>>
> system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSys
> >>> t
> >>> em.xml
> >>>
> >>> Author: james-masanz
> >>> Date: Mon Sep 30 21:48:01 2013
> >>> New Revision: 1527792
> >>>
> >>> URL: http://svn.apache.org/r1527792
> >>> Log:
> >>> CTAKES-248  - for named entities, since the annotation just has the
> >> begin and
> >>> end offset, it is requested to have a way to get the original
> >>> covered
> >> text
> >>> (especially for disjoint spans) so it is possible to know which
> >>> words in
> >> the
> >>> covered text were actually used in the matching to the dictionary
> >>> entry
> >>>
> >>> Modified:
> >>>    ctakes/trunk/ctakes-type-
> >>>
> system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSys
> >>> t
> >>> em.xml
> >>>
> >>> Modified: ctakes/trunk/ctakes-type-
> >>>
> system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSys
> >>> t
> >>> em.xml
> >>> URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-type-
> >>>
> system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSys
> >>> t em.xml?rev=1527792&r1=1527791&r2=1527792&view=diff
> >>>
> ==========================================================
> >>> ====================
> >>> Binary files - no diff available.


Mime
View raw message