incubator-ctakes-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r838482 - in /websites/staging/ctakes/trunk/content: ./ ctakes/2.6.0/user-guide-2.6.html
Date Thu, 15 Nov 2012 20:16:17 GMT
Author: buildbot
Date: Thu Nov 15 20:16:16 2012
New Revision: 838482

Staging update by buildbot for ctakes

    websites/staging/ctakes/trunk/content/   (props changed)

Propchange: websites/staging/ctakes/trunk/content/
--- cms:source-revision (original)
+++ cms:source-revision Thu Nov 15 20:16:16 2012
@@ -1 +1 @@

Modified: websites/staging/ctakes/trunk/content/ctakes/2.6.0/user-guide-2.6.html
--- websites/staging/ctakes/trunk/content/ctakes/2.6.0/user-guide-2.6.html (original)
+++ websites/staging/ctakes/trunk/content/ctakes/2.6.0/user-guide-2.6.html Thu Nov 15 20:16:16
@@ -83,15 +83,362 @@
   <div id="contenta">
-    <h1 id="ctakes-26-user-guide">cTAKES 2.6 User Guide</h1>
-<p>This does not include package name updates to reflect, and maven was
not used to generate the build. The build was done like the build for 2.5.</p>
-<p>The "install" is to just unzip the archive.</p>
-<p>To run cTAKES, you can start the CVD (or CPE) GUI using the script files [SH|BAT]
files found within the top level directory.
-Once in the CVD GUI, select an aggregate to load, such as cTAKESdesc/cpddesc/analysis_engine/AggregatePlaintextProcessor.xml,
then run the aggregate you just loaded using the menu options.</p>
-<p>The archive includes source, compiled class files, and a jar.</p>
-<p>The source from which this release was built is in SVN at</p>
-<p>Also included there, within "files for pipeline root", are the ANT scripts used
to merge the source directories and build the archive (similar to the way 2.5 was).</p>
-<p>For all other documentation please refer to the <a href="">cTAKES
2.5 documentation</a>.</p>
+    <h1 id="this-page-is-under-construction">This page is under construction</h1>
+<h1 id="ctakes-30-user-guide">cTAKES 3.0 User Guide</h1>
+<p>cTAKES users are those who wish to use cTAKES as it is without code modifications.
+With these instructions you can install cTAKES, configure it, and use it to process text.
+cTAKES is built around analysis of text associated with a medical record. If you were planning
to expand, change, or modify the
+code within cTAKES, refer to the <a href="/3.0.0/developer-guide-3.0">cTAKES 3.0 Developer
+<p>There are GUIs for the configuration and viewing of results, however, there are
no summaries, statistics, or pretty graphs.
+The results are lots of annotations recorded in <a href="">UIMA
XMI files</a>.
+You can see and sift through the results, but more processing is required to reap the benefits
of the annotations.
+The process that you set up to do these annotations is called a pipeline.</p>
+<p>These instructions will cover installation of cTAKES and test of some components

+including trained models for sentence detection and tagging parts of speech,
+dictionaries from a subset of the UMLS, a very small subset of the full LVG
+resource, etc.</p>
+<p>Further exploitation of the software's ability may require a few additional steps.
+For example, you may want to use a different dictionary in order to include vocabulary from
your institution.</p>
+<h2 id="install-ctakes">Install cTAKES</h2>
+<li>Make sure you have <a href="">Java</a>
1.6 or higher. Many systems come with Java already
+installed. Run this command to check your version:
+java -version
+<p>Download the <a href="NotYetAvailable"><strong></strong></a>
+Save the file to a temporary location on your machine.</p>
+<p>Unzip the ZIP file into a directory that you want to be the cTAKES installed home
+This directory we will call <strong>&lt;cTAKES_HOME&gt;</strong>. You
will need to refer to this later. <strong>Windows</strong>: <code>c:\cTAKES-3.0</code>
<strong>Linux</strong>: <code>    /usr/bin/cTAKES-3.0</code><br
+<h2 id="process-documents-using-ctakes">Process documents using cTAKES</h2>
+<p>cTAKES allows you to use most components in two different ways:</p>
+<li>Using cTAKES CAS Visual Debugger (CVD) to view the results stored as XMI files
or run the annotators or</li>
+<li>Using cTAKES collection processing engine (CPE) to process documents in &lt;cTAKES_HOME&gt;/testdata
+<h3 id="cas-visual-debugger-cvd">CAS Visual Debugger (CVD)</h3>
+<p>The main purpose of the <a href="">CAS
Visual Debugger (CVD)</a> 
+is to let you browse all the data that is created when you run an component over some text.

+Components are also called an "analysis engine" as they can be made up of multiple annotators.</p>
+<p>Open a command prompt and change to the &lt;cTAKES_HOME&gt; directory.<br
+<strong>Windows</strong>: <code>cd \cTAKES-3.0</code> <strong>Linux</strong>:
<code>cd /usr/bin/cTAKES-3.0</code><br />
+&nbsp;<br />
+<strong>Note:</strong> &lt;cTAKES_HOME&gt; must be your current directory
unless you are skilled at setting
+paths on your machine.</p>
+<p>Start the CAS Visual Debugger by running this command. The application may take
a minute to start on slower hardware:<br />
+<strong>Windows</strong>: <code>runctakesCVD.bat</code> <strong>Linux</strong>:
+<p>An analysis engine (AE) needs to be loaded in order to process text.<br />
+Use the <strong>Run</strong> -&gt; <strong>Load AE</strong> menu
bar command. Navigate to the file: <code>&lt;cTAKES_HOME&gt;/cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextProcessor.xml</code>
Click <strong>Open</strong>.</p>
+<p>Copy the text in this example and paste the contents into the Text section of CVD,
replacing the text that is already
+there. This example file can also be found in test data: <code>&lt;cTAKES_HOME&gt;/testdata/cdptest/testinput/plaintext/testpatient_plaintext_1.txt</code>
+Dr. Nutritious
+Medical Nutrition Therapy for Hyperlipidemia
+Referral from: Julie Tester, RD, LD, CNSD
+Phone contact: (555) 555-1212
+Height: 144 cm Current Weight: 45 kg Date of current weight: 02-29-2001
+Admit Weight: 53 kg BMI: 18 kg/m2
+Diet: General
+Daily Calorie needs (kcals): 1500 calories, assessed as HB + 20% for activity.
+Daily Protein needs: 40 grams, assessed as 1.0 g/kg.
+Pt has been on a 3-day calorie count and has had an average intake of 1100 calories.
+She was instructed to drink 2-3 cans of liquid supplement to help promote weight gain.
+She agrees with the plan and has my number for further assessment. May want a Resting
+Metabolic Rate as well. She takes an aspirin a day for knee pain.
+<p>From the menu bar, click <strong>Run</strong> -&gt; <strong>Run
+You'll get a list of all the annotations in the Analysis Results frame.</p>
+<p>Named entities are now recognized in this clinical text. Annotations of
+MedicationEventMention and EntityMention are created. To find one, in the
+<strong>Analysis Results frame</strong>, click on the keys in front of:
+<li>Then select <strong>edu.mayo.bmi.uima.core.type.textsem.EntityMention</strong>
+This will show an Annotation Index in the lower frame. Select any
+annotation in that lower frame and you will see the text discovered in the
+Text frame on the right.</li>
+<h3 id="collection-processing-engine-cpe">Collection processing engine (CPE)</h3>
+<p>The <a href="">Collection
Processing Engine (CPE) Configuration GUI</a> is for configuring components (aka analysis
engine) to process documents (called a pipeline).</p>
+<p>Open a command prompt and change to the &lt;cTAKES_HOME&gt; directory.<br
+<strong>Windows</strong>: <code>cd \cTAKES-3.0</code> <strong>Linux</strong>:
<code>cd /usr/bin/cTAKES-3.0</code><br />
+&nbsp;<br />
+<strong>Note:</strong> &lt;cTAKES_HOME&gt; must be your current directory
unless you are skilled at setting
+paths on your machine.</p>
+<p>Start the Collection Processing Engine (CPE) by running this command. The application
may take a minute to start on slower hardware:<br />
+<strong>Windows</strong>: <code>runctakesCPE.bat</code> <strong>Linux</strong>:
+<p>This will bring up the Collection Processing Engine Configurator. In the
+Menu bar click <strong>File</strong> &gt; <strong>Open CPE Descriptor</strong>.</p>
+<p>Navigate to the file: <code>&lt;cTAKES_HOME&gt;/cTAKESdesc/cdpdesc/collection_processing_engine/test_plaintext.xml</code>
Click <strong>Open</strong>.</p>
+<p>Click the Play button (green/blue <strong>play arrow</strong> near the
+<p>You should see that one document was processed. A collection of documents was processed,
+in this case, the collection only contained one just
+to show how to do it.<br />
+Close the results window.</p>
+<p>Close the CPE application. You may be prompted to save changes. Since this
+was just a test you may click the <strong>No</strong> button.</p>
+<h3 id="validate-cpe-results">Validate CPE Results</h3>
+<p>Open a command prompt and change to the &lt;cTAKES_HOME&gt; directory.<br
+<strong>Windows</strong>: <code>cd \cTAKES-3.0</code> <strong>Linux</strong>:
<code>cd /usr/bin/cTAKES-3.0</code><br />
+<p>To test the results, you will use a comparison tool that will help show that the
+results match expectations. Enter this command:
+java -cp cTAKES.jar edu.mayo.bmi.utils.xcas_comparison.Compare <First File> <Second
File> <diff-html>
+Where: <strong><em>&lt;First File&gt;</em></strong> is the
first file to compare; <strong><em>&lt;Second File&gt;</em></strong>
+the second file to compare; <strong><em>&lt;diff-html&gt;</em></strong>
is where the results are written
+to. For example:
+java -cp cTAKES.jar edu.mayo.bmi.utils.xcas_comparison.Compare ^
+"testdata\cdptest\testoutput\plaintext\sample_note_plaintext.xml" ^
+"testdata\cdptest\testsampleoutput\plaintext\sample_note_plaintext.xml" ^
+java edu.mayo.bmi.utils.xcas_comparison.Compare \
+"/usr/bin/cTAKES2.5/testdata/cdptest/testoutput/plaintext\sample_note_plaintext.xml" \
+Copy and paste the example above, which has had our example
+files already substituted, into a command prompt to run. In this case we have
+shipped an example of what the output should be for you to compare against.</p>
+<p>The resulting file will open for you. Look at the comparison to see the
+annotations resulting from this pipeline.
+<strong>Windows</strong>: <code>c:\stuff\diff-html.html</code> <strong>Linux</strong>:
<code>/tmp/diff-html.html</code><br />
+<p>Using the same CVD and CPE programs in the manner described above, you can
+test all the other components. The analysis engines and collection processing
+engines shipped with cTAKES for some of the annotators are described in the
+following table.</p>
+<th>Example Analysis Engine (AE)</th>
+<th>Example Collection processing Engine (CPE)</th>
+<th>Example test data</th>
+<td>Clinical Document Pipeline</td>
+<td>the complete cTAKES pipeline to obtain majority of cTAKES annotations</td>
+<td>obtain cTAKES chunking annotations</td>
+<td>Dependency Parser</td>
+<td>obtain dependency parsing tree</td>
+<td>Drug NER</td>
+<td>the annotator to obtain drug annotations</td>
+<td>Dictionary Lookup</td>
+<td>mapping cTAKES annotations to dictionaries (e.g., SNOMED_CT or RxNorm</td>
+<td>PAD Term Spotter</td>
+<td>identifying terms related to PAD</td>
+<td>Smoking Status</td>
+<td>the annotator to obtain document or patient-level smoking status</td>
+<td>Side Effect</td>
+<td>the annotator to find side effect mentions and sentences from clinical documents</td>
+<h2 id="next-steps">Next Steps</h2>
+<p>The <a href="3.0.0/component-use-guide-3.0">cTAKES 3.0 Component Use Guide</a>
will help you to
+understand in great detail each of the cTAKES components that have been
+installed. In some cases you can learn how to improve the components. However,
+before you go on to process text in production you will need to consider
+dictionaries and models.</p>
+<h3 id="dictionaries">Dictionaries</h3>
+<h4 id="bundled-umls-dictionaries">Bundled UMLS Dictionaries</h4>
+<p>cTAKES includes the complete UMLS (SNOMED-CT and RxNorm) dictionaries.</p>
+<li>An rxnorm_index database (a Lucene index) containing drug names from RxNorm</li>
+<li>A UMLS database (using two hsqldb tables) containing anatomical sites, procedures,
signs/symptoms, and disorders/diseases from SNOMED-CT (umls_ms_2011ab)</li>
+<p>To use them, you must have a UMLS username and password, and an Internet
+<p><strong>Note</strong>: If you do not have a UMLS username and password,
you may request one at <a href="">UMLS
+Terminology Services</a>.</p>
+<p>In order to use the UMLS dictionaries shipped with cTAKES you will need to do
+two things:</p>
+<li>Change the UMLSUser and UMLSPW &lt;nameValuePair&gt; strings in these descriptor
+files with your UMLS username and password.</li>
+<li>Dictionary Lookup: &lt;cTAKES_HOME&gt;/cTAKESdesc/lookupdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml</li>
+<li>(optional) Drug NER: &lt;cTAKES_HOME&gt;/cTAKESdesc/drugnerdesc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
+The following shows where in the files you would make the changes. (Do not
+change the &lt;configurationParameters&gt; by the same name.)
+<li>Include the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within your
+aggregate Analysis Engine or switch to the ones provided by cTAKES. cTAKES has
+provided duplicates of shipped Analysis Engine descriptors, put UMLS in the
+name, and placed DictionaryLookupAnnotatorUMLS.xml within them for these
+<li>Dictionary Lookup</li>
+<li>Clinical Documents pipeline</li>
+<li>Drug NER</li>
+<li>Side Effect</li>
+<p>So you simply need to switch to using those descriptors. For example, if you
+were using AggregateCdaProcessor.xml in the Clinical Documents pipeline you
+would switch to using AggregateCdaUMLSProcessor.xml instead and you will now
+hook into the complete dictionaries.</p>
+<p>You can, of course, modify your own aggregate Analysis Engine files and place
+the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within them.</p>
+<p>Since this is an in-memory database implementation, please be patient during
+the initial load as it could take approximately 20-30 seconds for the database
+to initialize.</p>
+<p>If you would like to go back to using the small sample dictionaries that do
+not require a UMLS username, use the DictionaryLookupAnnotator.xml (UMLS is
+not in the file name) Analyis Engine descriptor in your aggregate. Just
+removing your password from the DictionaryLookupAnnotatorUMLS.xml files will
+not switch you back to the small sample dictionaries.</p>
+<h4 id="lvg">LVG</h4>
+<p>We have successfully tested the 2008 release of the full <a href="http://lexsrv2.">LVG</a>
+data. In order to use this release of the full LVG data you should:</p>
+<li>Download either the full version or the lite version from <a href="">NIH
Lexical Tools</a></li>
+<li>Extract the TGZ file that you downloaded with a tool like 7-zip (available online)
to a temporary directory. On some operating systems, like Windows, this may need to be done
in two steps, 1) to uncompress and 2) to unzip.</li>
+<li>Replace the directory &lt;cTAKES_HOME&gt;/resources/lvgresources/lvg/data/HSqlDb
with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.</li>
+<li>In the future, you can upgrade to later versions of LVG by editing the &lt;cTAKES_HOME&gt;/resources/lvgresources/lvg/data/config/
file, replacing "lvg2008" with the name of the new release.</li>
+<h4 id="building-your-own-dictionaries">Building Your Own Dictionaries</h4>
+<p>To install customized dictionaries for RxNorm, SNOMED-CT, or other
+vocabularies that are available through the UMLS, see the following posts on
+the cTAKES forums:</p>
+<li><a href=";t=423">;t=423</a></li>
+<li><a href=";t=80&amp;start=20#p1459">;t=80&amp;start=20#p1459</a></li>
+<h3 id="models">Models</h3>
+<p>Some models included in cTAKES may not represent your data distribution well.
+If you want to build or train your own models, please read the <a href="3.0.0/component-use-guide-3.0">cTAKES
+Component Use Guide</a>,
+<li><a href="NotYet Available">Training a sentence detector model</a></li>
+<li>Training a Part of Speech (POS) tagger model (Building a model Obtaining training
+<li>Creating a Part of Speech (POS) tag dictionary (Building a tag dictionary)</li>
+<li>Training a chunker model (Building a model - Prepare GENIA training data)</li>
+<li>Training a dependency parser (Dependency Parser)</li>
  <div id="footera">

View raw message