ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: LookupDesc_DrugNER.xml
Date Thu, 08 Aug 2013 16:27:43 GMT
Hi Larry,

dictionaries/dictionary/metaFields 
These determine which fields are available for setting a property within lookupInitializer
or lookupConsumer

dictionaries/dictionary/excludeList
Something listed in the excludeList will be ignored during dictionary lookup
This is used when something is in the dictionary that we are not interested in having annotated,
or something in the dictionary is so much more often used to mean something else that we decide
to skip having it annotated. For example, since cTAKES ignores cases during lookup, "Dr. Smith"
normally would result in Dr being marked as diabetic retinopathy. Rather than having "Dr."
marked incorrectly as diabetic retinopathy, we ignore all occurrences of "dr" (or "Dr" or
"DR") by using the excludeList (which will ignore even those cases where DR is used to mean
diabetic retinopathy - the rationale is that hopefully if diabetic retinopathy is an important
concept for the document, it will be spelled out somewhere within the document)

lookupBindings/lookupBinding/lookupInitializer/properties
windowAnnotations - yes, this specifies which annotation type to perform lookups within
exclusionTags - these are part of speech tags - tokens tagged with these are ignored
maxPermutationLevel - affects the number of permutions of word orderings that are searched
for multi-word dictionary entries.

lookupConsumer
I do see typeIdField in my LookupDesc_DrugNER.xml.
lookupConsumer
Defining a property for each of the fields in your lucene index that you want available to
the Consumer, is the right thing to do.
This is here so that if you have a lucene index that has more fields than what you need for
cTAKES, you can have cTAKES just retrieve the fields you need it to.

Hope that helps. If still more explanation would be useful, let me know.

-- James

-----Original Message-----
From: user-return-244-Masanz.James=mayo.edu@ctakes.apache.org [mailto:user-return-244-Masanz.James=mayo.edu@ctakes.apache.org]
On Behalf Of Kline, Larry D
Sent: Wednesday, August 07, 2013 12:48 PM
To: user@ctakes.apache.org
Subject: RE: LookupDesc_DrugNER.xml

Thanks James.  I took the LookupDesc_DrugNER.xml file that came with
cTAKES (2.5) and slightly modified it.  It seems to work but I just like
to know what the fields mean.  For example:

* dictionaries/dictionary/metaFields
No idea what these do.  I left them the same.

* dictionaries/dictionary/metaFields
I assume these words are excluded from consideration when looking up a
string in the dictionary

* lookupBindings/lookupBinding/lookupInitializer/properties
I read through some of the code of the initializer.  I can see what it's
doing, but exactly how these fields affect the results is not obvious to
me.  I guess windowAnnotations specifies which annotation type to
perform lookups within.  The others I don't understand.

* lookupBindings/lookupBinding/lookupConsumer
I defined a property for each of the fields stored in my lucene index,
but I'm not sure if I needed to do that. In the default implementation
the field typeIdField is used in the lookup consumer, but it is defined
nowhere in the xml file.

Some background: I build my own Lucene index from tables of FDB data
that we maintain locally.  So I'm not looking anything up in UMLS.  I've
defined my own lookupConsumer that gets the data from Lucene and I
defined my own DrugOntologyConcept (subtype of OntologyConcept) to hold
that information.

Thanks,
Larry

-----Original Message-----
From: Masanz, James J. [mailto:Masanz.James@mayo.edu] 
Sent: Wednesday, August 07, 2013 9:59 AM
To: 'user@ctakes.apache.org'
Subject: RE: LookupDesc_DrugNER.xml

There's a very brief description of the file on

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+-+Dictiona
ry+Lookup

"A LookupDescriptorFile such as lookup/LookupDesc.xml, found in
resources/, defines the dictionary(s) used, and the classes that
interact with the dictionary(s). The implementation tag identifies the
type of dictionary: Lucene index (luceneImpl), database (jdbcImpl), or
delimited flat file (csvImpl). See class
org.apache.ctakes.dictionary.lookup.ae.LookupParseUtilities.java for
implementation details."

There are a few comments within the file. But as far as the specifics of
the individual elements, if you describe what you'd like to do, I or
someone else on this list should be able to help.

-- James


From: user-return-241-Masanz.James=mayo.edu@ctakes.apache.org
[mailto:user-return-241-Masanz.James=mayo.edu@ctakes.apache.org] On
Behalf Of Kline, Larry D
Sent: Wednesday, August 07, 2013 11:45 AM
To: user@ctakes.apache.org
Subject: LookupDesc_DrugNER.xml

Can anyone tell me where I can find a description of the format of this
file?

</pre>The contents of this electronic mail message and any attachments
are confidential, possibly privileged and intended for the addressee(s)
only.<br>Only the addressee(s) may read, disseminate, retain or
otherwise use this message. If received in error, please immediately
inform the sender and then delete this message without disclosing its
contents to anyone.</pre>

Mime
View raw message