incubator-ctakes-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r838545 - in /websites/staging/ctakes/trunk/content: ./ ctakes/2.6.0/ctakes-2.6-Dictionary-Lookup.html
Date Thu, 15 Nov 2012 22:52:08 GMT
Author: buildbot
Date: Thu Nov 15 22:52:07 2012
New Revision: 838545

Staging update by buildbot for ctakes

    websites/staging/ctakes/trunk/content/   (props changed)

Propchange: websites/staging/ctakes/trunk/content/
--- cms:source-revision (original)
+++ cms:source-revision Thu Nov 15 22:52:07 2012
@@ -1 +1 @@

Added: websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Dictionary-Lookup.html
--- websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Dictionary-Lookup.html (added)
+++ websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Dictionary-Lookup.html Thu
Nov 15 22:52:07 2012
@@ -0,0 +1,216 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+ 2.0
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+<link href="/ctakes/css/ctakes.css" rel="stylesheet" type="text/css">
+<title>cTAKES 2.6 Dictionary Lookup</title>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <div class="banner">
+      <div id="bannerleft">
+		<a href=""><img src=""
alt="The Apache Software Foundation" border="0"/></a>
+	<br/>
+			<img alt="cTAKES logo" src="/ctakes/images/ctakes_logo.jpg" border="0"/>
+      </div>  
+    <div id="bannerright">	
+	      <img id="asf-logo" alt="Apache Incubator" src=""
+	  </div>
+ </div>  
+  <div id="clear"></div>
+  <div id="sidenav">
+    <h1 id="general">General</h1>
+<li><a href="/ctakes/index.html">About</a></li>
+<li><a href="/ctakes/gettingstarted.html">Getting Started</a></li>
+<li><a href="/ctakes/downloads.html">Downloads</a></li>
+<li><a href="/ctakes/glossary.html">Glossary</a></li>
+<h1 id="community">Community</h1>
+<li><a href="/ctakes/get-involved.html">Get Involved</a></li>
+<li><a href="">Bug Tracker</a></li>
+<li><a href="/ctakes/mailing-lists.html">Mailing Lists</a></li>
+<li><a href="/ctakes/people.html">People</a></li>
+<li><a href="">Incubator page</a></li>
+<li><a href="/ctakes/license.html">License</a></li>
+<li><a href="/ctakes/history.html">History</a></li>
+<li><a href="/ctakes/community-faqs.html">Community FAQs</a></li>
+<h1 id="users">Users</h1>
+<li><a href="/ctakes/userguide.html">User Guide</a></li>
+<li><a href="/ctakes/user-faqs.html">User FAQs</a></li>
+<h1 id="developers">Developers</h1>
+<li><a href="/ctakes/developerguide.html">Developer Guide</a></li>
+<li><a href="/ctakes/developer-faqs.html">Developer FAQs</a></li>
+<h1 id="ppmc">PPMC</h1>
+<li><a href="/ctakes/ppmc-faqs.html">PPMC FAQs</a></li>
+<li><a href="/ctakes/ctakes-release-guide.html">Release Guide</a> <br
+<h1 id="asf">ASF</h1>
+<li><a href="">Apache Software Foundation</a></li>
+<li><a href="">Thanks</a></li>
+<li><a href="">Become a Sponsor</a></li>
+  </div>
+  <div id="contenta">
+    <h1 id="ctakes-26-dictionary-lookup">cTAKES 2.6 - Dictionary Lookup</h1>
+<h2 id="overview-of-dictionary-lookup">Overview of Dictionary Lookup</h2>
+<p>The dictionary lookup annotator finds the entries from one or more
+dictionaries that match the document text in some way. Within this annotator,
+these matches are called lookup hits.</p>
+<p>The dictionary lookup annotator is very customizable. It can look for matches
+where the words in the dictionary entries appear in the same order as the
+words in the document text, or it can look for permutations of the words from
+the dictionary. Moreover, it can look just for exact matches of the words, or
+it can also look for matches to the canonical forms of the words.</p>
+<p>Searches for a lookup hit are limited to within windows, where the window type
+is defined in the LookupDescriptorFile. A window can be the words that fall
+within the same Sentence, the same Chunk, the same LookupWindowAnnotation or
+any other annotation. See the clinical documents pipeline project for an
+example of an analysis engine (LookupWindowAnnotator.xml) that creates
+<h2 id="implementation-of-dictionry-lookup">Implementation of Dictionry Lookup</h2>
+<p>Starting with version 1.3, cTAKES includes UMLS (SNOMED CT and RxNorm)
+dictionaries. To use those dictionaries, you must have a UMLS username and
+password, and an internet connection (to verify your UMLS username and
+password). If you do not have a UMLS username and/or are not interested in
+those dictionaries, you can build your own or use the small sample
+dictionaries (see below).</p>
+<p>The behavior of the dictionary lookup annotator is controlled by the
+parameters and resources defined in the analysis engine descriptor, and by the
+contents of the resource called the LookupDescriptorFile.</p>
+<p>For example, if the analysis engine descriptor DictionaryLookup.xml contains a
+resource named LookupDescriptorFile with value lookup/LookupDesc.xml, then the
+parameter settings and resources named within DictionaryLookup.xml, together
+with the values within lookup/LookupDesc.xml will control the actions of the
+dictionary lookup annotator.</p>
+<p>The lookupInitializer and lookupConsumer classes are specified within the
+LookupDescriptorFile. The algorithm used for looking up the terms is defined
+by the lookupInitializer, which creates the lookup hits. The lookupConsumer
+adds annotations to the CAS for some or all of the lookup hits.</p>
+<p>An example of adding only some of the lookup hits to the CAS is if you have a
+dictionary of RxNorm terms with their RxNorm codes, and a dictionary of terms
+from the OrangeBook, and want to create annotations for those terms that are
+in the OrangeBook that also have an RxNorm code.</p>
+<p>This can be done using class as the
+lookupInitializer, and using class OrangeBookFilterConsumerImpl as the
+lookupConsumer, provided you have the RxNorm dictionary, and you configure the
+LookupDescriptorFile resource to use your RxNorm dictionary.</p>
+<p><img alt="" src="/images/icons/emoticons/check.png" /></p>
+<p><strong>Tip</strong><br />
+<p>Dictionary entries need to have been tokenized the way the pipeline tokenizes
+the document text. For example, the lookup algorithm will not find a lookup
+hit if a dictionary entry is "ear, skin" but the document text contains the
+same text ("ear, skin") and the pipeline has tokenized that text as the three
+tokens "ear" "," "skin". To find a lookup hit for the three tokens, the
+dictionary entry should be tokenized, with a space before the comma: "ear ,
+<p><img alt="" src="/images/icons/emoticons/check.png" /></p>
+<p><strong>Tip</strong><br />
+<p>Editing dictionary lookup AE descriptors in Eclipse</p>
+<p>The analysis engine descriptors for this annotator use elements of type
+configurableDataResourceSpecifier. These cannot be modified from the
+Parameters or Resources tabs of the Component Descriptor Editor (at least not
+in UIMA 2.2). To view these values or edit them, use the Sources tab or open
+the descriptor with a text editor.</p>
+<p>To determine the LookupDescriptorFile for an analysis engine, open the
+analysis engine descriptor (e.g. DictionaryLookupannotator.xml) and note the
+URL for the LookupDescriptorFile resource (e.g. lookup/LookupDesc.xml).</p>
+<p>A LookupDescriptorFile such as lookup/LookupDesc.xml, found in resources/,
+defines the dictionary(s) used, and the classes that interact with the
+dictionary(s). The implementation tag identifies the type of dictionary:
+Lucene index (luceneImpl), database (jdbcImpl), or delimited flat file
+(csvImpl). See class for
+implementation details.</p>
+<p><img alt="" src="/images/icons/emoticons/check.png" /></p>
+<p><strong>Tip</strong><br />
+<p>To better understand the dictionary lookup annotator code you could start by
+reading the Javadoc API for the classes and'</p>
+<h3 id="dictionarylookupannotatorumlsxml">DictionaryLookupAnnotatorUMLS.xml</h3>
+<p>This uses the bundled UMLS (SNOMED CT and RxNorm) dictionaries. Before using
+this analysis engine descriptor, update the UMLSUser and UMLSPW parameters
+within this descriptor with your UMLS username and password. You will need to
+have an active connection to the internet so your UMLS username and password
+can be verified.</p>
+<h3 id="dictionarylookupannotatorxml">DictionaryLookupAnnotator.xml</h3>
+<p>This uses the small sample dictionaries. This annotator can be run out-of-the-
+box without modifying any parameters, but annotates a very limited set of
+terms such as carcinoma, aspirin, knee, and pain.</p>
+<h3 id="dictionarylookupannotatorcsvxml">DictionaryLookupannotatorCSV.xml</h3>
+<p>This is an example of how to use a dictionary contained in a delimited file
+rather than a database or a Lucene index. This is only recommended for small
+<h3 id="dictionarylookupannotatordbxml">DictionaryLookupannotatorDB.xml</h3>
+<p>This is a skeleton of how you could use a dictionary contained in a database
+that can be accessed via a JDBC driver instead of using a Lucene index or flat
+<p>Refer also to the page on <a href="">dictionaries
in the cTAKES documentation on SourceF
+<h2 id="sample-dictionaries">Sample dictionaries</h2>
+<p>This project includes two sample dictionaries that are used by default:</p>
+<p>(1) a sample database (a Lucene index) containing a few drug names</p>
+<p>(2) a sample database (using 2 Lucene indexes) containing a few anatomical
+sites, procedures, and disorders/diseases</p>
+<p>These can be used to verify your cTAKES install and to give a small flavor of
+what cTAKES can do, and unlike the bundled UMLS dictionaries, do not require a
+UMLS username or an internet connection.</p>
+<p>The programs used to create these Lucene indexes are scripts/java/edu/mayo/bmi
+/dictionarytools/ and scripts/java/edu/ma
+<p><img alt="" src="/images/icons/emoticons/check.png" /></p>
+<p><strong>Tip</strong><br />
+<p>To view the contents of a Lucene index, you could use a tool such as Luke.</p>
+<h2 id="creating-your-own-dictionaries">Creating your own dictionaries</h2>
+<p>To create a dictionary yourself, you could download a copy of the UMLS
+Metathesaurus and build upon the program mentioned above to create a Lucene
+index of the desired vocabulary.</p>
+<p>Alternatively, you could use a different program in that same package that
+reads from a pipe-delimited file:</p>
+<p>scripts/java/edu/mayo/bmi/dictionarytools to create a Lucene index.</p>
+  </div>
+ <div id="footera">
+    <div id="copyrighta">
+      <p>Copyright &#169; 2011 The Apache Software Foundation, Licensed under the
<a href="">Apache License, Version 2.0</a>.<br/>Apache
and the Apache feather logo are trademarks of The Apache Software Foundation.</p>
+    </div>
+ </div>

View raw message