incubator-ctakes-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r838602 - in /websites/staging/ctakes/trunk/content: ./ ctakes/2.6.0/ctakes-2.6-Smoking-Status.html
Date Fri, 16 Nov 2012 16:46:46 GMT
Author: buildbot
Date: Fri Nov 16 16:46:45 2012
New Revision: 838602

Staging update by buildbot for ctakes

    websites/staging/ctakes/trunk/content/   (props changed)

Propchange: websites/staging/ctakes/trunk/content/
--- cms:source-revision (original)
+++ cms:source-revision Fri Nov 16 16:46:45 2012
@@ -1 +1 @@

Added: websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Smoking-Status.html
--- websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Smoking-Status.html (added)
+++ websites/staging/ctakes/trunk/content/ctakes/2.6.0/ctakes-2.6-Smoking-Status.html Fri
Nov 16 16:46:45 2012
@@ -0,0 +1,372 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+ 2.0
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+<link href="/ctakes/css/ctakes.css" rel="stylesheet" type="text/css">
+<title>cTAKES 2.6 Smoking Status</title>
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <div class="banner">
+      <div id="bannerleft">
+		<a href=""><img src=""
alt="The Apache Software Foundation" border="0"/></a>
+	<br/>
+			<img alt="cTAKES logo" src="/ctakes/images/ctakes_logo.jpg" border="0"/>
+      </div>  
+    <div id="bannerright">	
+	      <img id="asf-logo" alt="Apache Incubator" src=""
+	  </div>
+ </div>  
+  <div id="clear"></div>
+  <div id="sidenav">
+    <h1 id="general">General</h1>
+<li><a href="/ctakes/index.html">About</a></li>
+<li><a href="/ctakes/gettingstarted.html">Getting Started</a></li>
+<li><a href="/ctakes/downloads.html">Downloads</a></li>
+<li><a href="/ctakes/glossary.html">Glossary</a></li>
+<h1 id="community">Community</h1>
+<li><a href="/ctakes/get-involved.html">Get Involved</a></li>
+<li><a href="">Bug Tracker</a></li>
+<li><a href="/ctakes/mailing-lists.html">Mailing Lists</a></li>
+<li><a href="/ctakes/people.html">People</a></li>
+<li><a href="">Incubator page</a></li>
+<li><a href="/ctakes/license.html">License</a></li>
+<li><a href="/ctakes/history.html">History</a></li>
+<li><a href="/ctakes/community-faqs.html">Community FAQs</a></li>
+<h1 id="users">Users</h1>
+<li><a href="/ctakes/userguide.html">User Guide</a></li>
+<li><a href="/ctakes/user-faqs.html">User FAQs</a></li>
+<h1 id="developers">Developers</h1>
+<li><a href="/ctakes/developerguide.html">Developer Guide</a></li>
+<li><a href="/ctakes/developer-faqs.html">Developer FAQs</a></li>
+<h1 id="ppmc">PPMC</h1>
+<li><a href="/ctakes/ppmc-faqs.html">PPMC FAQs</a></li>
+<li><a href="/ctakes/ctakes-release-guide.html">Release Guide</a> <br
+<h1 id="asf">ASF</h1>
+<li><a href="">Apache Software Foundation</a></li>
+<li><a href="">Thanks</a></li>
+<li><a href="">Become a Sponsor</a></li>
+  </div>
+  <div id="contenta">
+    <h1 id="ctakes-26-smoking-status">cTAKES 2.6 - Smoking status</h1>
+<h2 id="overview-of-smoking-status">Overview of Smoking status</h2>
+<p>The "smoking status" pipeline processes flat files or CDA (Clinical Document
+Architecture) documents to classify patient records into five pre-determined
+categories - past smoker (P), current smoker (C), smoker (S), nonsmoker (N),
+and unknown (U), where a past and current smoker are distinguished based on
+temporal expressions in the patient's medical records.</p>
+<h2 id="analysis-engines-annotator">Analysis engines (annotator)</h2>
+<h3 id="simulatedprodsmokingtaexml">SimulatedProdSmokingTAE.xml</h3>
+<p>The file desc/analysis_engine/SimulatedProdSmokingTAE.xml provides a working
+example of the smoking status pipeline, utilizing the aggregate TAEs. This
+Aggregate includes Token, Sentence, SentenceAdjuster, ClassifiableEntries
+(which in turn invokes the ProductionPostSentenceAggregate annotators
+<p>Shipped with this annotator:</p>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>SimulatedProdSmokingTAE_CDA.xml is also provided to process CDA documents. The
+aggregate flow will contain the annotator version
+ExternalBaseAggregateTAE_CDA.xml which will process the document as a Clinical
+Document Architecture (CDA) file.</p>
+<h3 id="productionpostsentenceaggregate_step1xml">ProductionPostSentenceAggregate_step1.xml</h3>
+<p>The file desc/analysis_engine/ProductionPostSentenceAggregate_step1.xml
+Aggregate TAE is used to run the first step classification stage via the
+<li>TokenizerAnnotator (core project)</li>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>This annotator is not contained in the aggregate flow, but introduced via the
+resource settings of the ClassifiableEntriesAnnotator (see the method
+initialize() in this class).
+UIMAFramework.produceAnalysisEngine(taeSpecifierStep1, ResMgr, null)
+instantiates the AE and
+retrieves the CAS.</p>
+<h3 id="productionpostsentenceaggregate_step2_libsvmxml">ProductionPostSentenceAggregate_step2_libsvm.xml</h3>
+<p>The file desc/analysis_engine/ProductionPostSentenceAggregate_step2_libsvm.xml
+is the Aggregate TAE used to run the second classification stage via the
+libSVM training module. Shipped with this annotator:</p>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>This annotator is not contained in the aggregate flow, but introduced via the
+resource settings of the ClassifiableEntriesAnnotator (see the method
+initialize() in this class).
+UIMAFramework.produceAnalysisEngine(taeSpecifierStep2, ResMgr, null)
+instantiates the AE and the ClassifiableEntriesAnnotator process method will
+process if the smoking status is known.</p>
+<h3 id="externalbaseaggregatetaexml">ExternalBaseAggregateTAE.xml</h3>
+<p>The file desc/analysis_engine/ExternalBaseAggregateTAE.xml provides an
+aggregate flow for the external annotations, SimpleSegmentAnnotator,
+TokenizerAnnotator, SentenceDetectorAnnotator, and LvgAnnotator. Shipped with
+this annotator:</p>
+<li>TokenizerAnnotator (core project),</li>
+<li>SentDetectorAnnotator (core project),</li>
+<li>LvgAnnotation (LVG project).</li>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>ExternalBaseAggregateTAE_CDA.xml is also provided to process CDA documents.
+The aggregate flow will contain the specialized class CdaCasInitializer
+(replacing the SimpleSegmentAnnotator used by flat file/non-CDA version) which
+will process the document as a Clinical Document Architecture (CDA) file. This
+annotator is contained in the SimulatedProdSmokingTAE_CDA aggregate. Red text
+indicates shipped with this annotator.</p>
+<h3 id="sentenceadjusterxml">SentenceAdjuster.xml</h3>
+<p>The file desc/analysis_engine/SentenceAdjuster.xml drives the java class annotator that uses some patterns and
+some rules about those patterns to adjust certain annotations. This annotator
+was extended to handle sentence boundaries for the Smoking status
+<p>Example: "Tobacco: none" has two sentences as detected by the original cTAKES
+sentence boundary detector. This annotator merges them into one sentence to
+enable correct negation detection.</p>
+<p><strong>Parameters</strong><br />
+UseSegments &lt;Boolean/Single-valued/Optional&gt;</p>
+<p>(Default Value = 'false') Flag whether to use segments or full doc text.</p>
+<p>SegmentsToSkip &lt;String/Multi-valued/Optional&gt;</p>
+<p>WordsToIgnore &lt;String/Multi-valued/Optional&gt;</p>
+<p>(Default Value = 'null') Set of words that PostModifier should ignore (act as
+if the word was not there) when looking for a pattern match.</p>
+<p>WordsInPattern &lt;String/Multi-valued/Required&gt;</p>
+<p>(Default Value = 'no none never quit smoked ;') The list of words ("none",
+"no", etc) used in the pattern.</p>
+<h3 id="classifiableentriesannotatorxml">ClassifiableEntriesAnnotator.xml</h3>
+<p>The file desc/analysis_engine/ClassifiableEntriesAnnotator.xml drives the java
+class Converts Sentences to
+ClassifiableEntries (required by SmokingStatus pipeline) and ultimately to
+<p><strong>Parameters</strong><br />
+TruthFile &lt;String/Single-valued/Optional&gt;</p>
+<p>(Default Value = 'null') Delimited Truth file. Delimiter is expected to be the
+TAB char. If not specified, then the classification feature of the
+RecordSentence object will not be set.</p>
+<p>AllowedClassifications &lt;String/Multi-valued/Optional&gt;</p>
+UNKNOWN"') See for permitted string values.</p>
+<p>SectionsToIgnore &lt;String/Multi-valued/Optional&gt;</p>
+<p>(Default Value = '"20109" "20138"') Sections to ignore for ClassifiableEntries
+- Family History (20109). A given patient's smoking status could be confused
+by smoking status of others. To avoid this confusion there is an option to
+exclude certain sections such as family history.</p>
+<p>ConWordsFile &lt;Boolean/Single-valued/Optional&gt;</p>
+<p>(Default Value =
+Contradiction words list. If this word appears in sentence do not negate.</p>
+<p><strong>Resources</strong><br />
+<p>(Default Value =
+Annotator responsible for the first classification step, namely,
+<p>(Default Value = '$main_root/desc/analysis_engine/ProductionPostSentenceAggreg
+ate_step2_libsvm.xml') Annotator responsible for second classification step.</p>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>The UimaDescriptorStep1/UimaDescriptorStep2 are introduced as resources via
+the ClassifiableEntriesAnnotator annotator during the initialization step.
+This allows the aggregates specified to be instantiated and analysis
+processing to be handled on a separate asynchronized thread. This enhances
+performance overall by ensuring the resources required by the process method
+will have output of the ProductionPostSentenceAggregates prepared without
+requiring a synchronized data flow (i.e. explicit aggregate flow via component
+descriptor aggregate flow).</p>
+<h3 id="kurulebasedclassifierannotatorxml">KuRuleBasedClassifierAnnotator.xml</h3>
+<p>The file desc/analysis_engine/KuRuleBasedClassifierAnnotator.xml drives the
+java class Known vs
+Unknown classifier using smoking related keywords.</p>
+<p><strong>Parameters</strong><br />
+CaseSensitive &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false') Specifies if a distinction between lower and upper
+case text will be considered.</p>
+<p>classAttribute &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'smoking_status') Value used by the NominalAttributeValue via
+<p>SmokingWordsFile &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'ss/data/KU/keywords.txt') Smoking related keywords to
+identify "known" class.</p>
+<p>UnknownWordsFile &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'ss/data/KU/unknown_words.txt') If this word/phrase appears,
+treat the sentence as UNKNOWN.</p>
+<h3 id="pcsclassifierannotator_libsvmxml">PcsClassifierAnnotator_libsvm.xml</h3>
+<p>The file desc/analysis_engine/PcsClassifierAnnotator.xml smoking status
+classifier using libsvm. This annotator plays the same role as
+PcsBOWFeatureAnnotator.xml, PcsClassifierAnnotator.xml, and
+BOWFeatureRemovalAnnotator.xml, which use libsvm.</p>
+<p><strong>Parameters</strong><br />
+CaseSensitive &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false') Specifies if a distinction between lower and upper
+case text will be considered.</p>
+<p><strong>Resources</strong><br />
+<p>(Default Value = 'file:ss/data/PCS/stopwords_PCS.txt)'</p>
+<p>Resource file that provides terms used as stop words, e.g. "a" "an" "the".</p>
+<p>(Default Value = 'file:ss/data/PCS/keywords_PCS.txt)'</p>
+<p>Resource file that provides terms used as PCS key words, e.g. '"refrain"
+"discussed" "to_quit" (if bigram it is connected by underscore, i.e. "_")'.</p>
+<p>(Default Value = 'file:ss/data/PCS/pcs_libsvm-2.91.model')</p>
+<p>Resource file that provides trained model for smoking status classification.</p>
+<h3 id="artificialsentenceannotatorxml">ArtificialSentenceAnnotator.xml</h3>
+<p>The file desc/analysis_engine/ArtificialSentenceAnnotator.xml drives the java
+class Artificially creates a new
+SentenceAnnotation object by treating the entire document as a sentence. The
+offset values from the DocumentAnnotation object are transferred over to the
+new SentenceAnnotation object.</p>
+<p><strong>Parameters</strong><br />
+srcObjClass &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false') Source JCas object class.</p>
+<p>This must be an object that already exists in the JCas.</p>
+<p>destObjClass &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false') Destination JCas object class.</p>
+<p>A new JCas object will be created.</p>
+<p>dataBindMap &lt;String/Multi-valued/Required&gt;</p>
+<p>(Default Value = 'false')</p>
+<p>Binds data from source to destination.</p>
+<p>Format for each entry is the getter method name of the source to the setter
+method name of the destination. e.g. getMyValue|setMyValue</p>
+<h3 id="smokingstatusdictionarylookupannotatorxml">SmokingStatusDictionaryLookupAnnotator.xml</h3>
+<p>The file desc/analysis_engine/SmokingStatusDictionaryLookupAnnotator.xml
+drives the java class
+Performs dictionary lookup and stores the hits as NamedEntityAnnotation
+<p><strong>Resources</strong><br />
+<p>(Default Value = 'file:ss/data/SmokingStatusLookupConfig.xml)'</p>
+<p>Defines which dictionaries will be used, the implementation specifics, and
+metaField configuration.</p>
+<p>(Default Value = 'file:ss/data/smoker.dictionary)'</p>
+<p>Resource file that provides terms used as smoking words, e.g. '"smokes"
+<p>(Default Value = 'file:ss/data/nonsmoker.dictionary')</p>
+<p>Resource file that provides terms used as non-smoking words, e.g. '"non-
+<h3 id="negationannotatorxml">NegationAnnotator.xml</h3>
+<p>The file desc/analysis_engine/NegationAnnotator.xml drives the java class
+edu.mayo.bmi.uima.context.ContextAnnotator. Boundary tokens moved to external
+resource - ss/data/context/boundaryData.txt.</p>
+<p><strong>Resources</strong><br />
+<p>(Default Value = 'file:ss/data/context/boundaryData.txt')</p>
+<p>Resource file that provides terms used as sentence boundaries, e.g.
+'"nevertheless" "how" ";" "."'.</p>
+<p><img alt="" src="/images/icons/emoticons/information.png" /></p>
+<p>The parameters provided act the same way that the core's version of the
+'NegationAnnotator', but since the boundary stop words are different for the
+smoking status pipeline, a separate implementation was necessary. However,
+current release of 'NegationAnnotator' does not use this resource.</p>
+<h2 id="cas-consumers-recordresolutioncasconsumerxml">CAS consumers - RecordResolutionCasConsumer.xml</h2>
+<p>The CAS consumer provided in
+/desc/cas_consumper/RecordResolutionCasConsumer.xml drives the java class iterates over all
+sentences (each CAS equals one sentence) for a record and resolves the final
+classification value for the record. Output is saved to an delimited file.
+Additionally, optionally provides the overall patient level classification
+based on record level classification.</p>
+<p><strong>Parameters</strong><br />
+OutputFile &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = 'c:\temp\record_resolution.txt')</p>
+<p>Specifies the location of the detail and summary report.</p>
+<p>Delimiter &lt;String/Single-valued/Required&gt;</p>
+<p>(Default Value = '|')</p>
+<p>Specifies the delimiter for the output file.</p>
+<p>ProcessingCDADocument &lt;Boolean/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false')</p>
+<p>Specifies whether the processed files should be handled as CDA documents.</p>
+<p>RunPatientLevelClassification &lt;Boolean/Single-valued/Required&gt;</p>
+<p>(Default Value = 'false')</p>
+<p>Specifies whether the post processing step of generating a summary patient
+level classification is done.</p>
+<p>FinalClassificationOutputFile &lt;String/Single-valued/Optional&gt;</p>
+<p>(Default Value = 'null')</p>
+<p>Specifies name and location of the summary report file which holds the final
+patient level classifications.</p>
+<p><strong>Resources</strong><br />
+<p>The support vector machine (SVM) classificiation tool provided at
+/lib/libsvm-2.91.jar used to train the smoking status model.</p>
+<h2 id="how-to-create-your-own-smoking-status-classifier-model">How to Create your
own smoking status classifier model</h2>
+<li>Create sentence-level smoking status data with the format of: sentence|class_label
(class_label: P, C, S).</li>
+<p>He quit smoking three years ago.|P She is smoking currently.|C The patient has
+a history of tobacco use.|S</p>
+<li>Run the script on the sentence-level
smoking status data to generate the libSVM training data.</li>
+<p>In this script, the variable "dataFile" in main() must point to the sentence-
+level smoking status data. Set the other variables also if necessary. Users
+might create their own keywordFile that contains keywords used in smoking
+status classification (see for details.)</p>
+<li>Create new model on the libSVM training data.</li>
+<p>The command with our options used in the current model is:</p>
+<p><strong>java -classpath path_of_libsvm_jar_file svm_train -s 0 -t 1 -g 1 -r
1 -d 1 training_data_file new_model</strong><br />
+Users might use their own customized libSVM options.</p>
+<li>Save new_model in the resources/ss/data/PCS/</li>
+<li>Change the Resources of "PathOfModel" in PcsClassifierAnnotator_libsvm.xml to "new_model"</li>
+  </div>
+ <div id="footera">
+    <div id="copyrighta">
+      <p>Copyright &#169; 2011 The Apache Software Foundation, Licensed under the
<a href="">Apache License, Version 2.0</a>.<br/>Apache
and the Apache feather logo are trademarks of The Apache Software Foundation.</p>
+    </div>
+ </div>

View raw message