Return-Path: X-Original-To: apmail-ctakes-commits-archive@www.apache.org Delivered-To: apmail-ctakes-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 732DA18954 for ; Wed, 16 Mar 2016 21:20:41 +0000 (UTC) Received: (qmail 18167 invoked by uid 500); 16 Mar 2016 21:20:41 -0000 Delivered-To: apmail-ctakes-commits-archive@ctakes.apache.org Received: (qmail 18129 invoked by uid 500); 16 Mar 2016 21:20:41 -0000 Mailing-List: contact commits-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list commits@ctakes.apache.org Received: (qmail 18120 invoked by uid 99); 16 Mar 2016 21:20:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Mar 2016 21:20:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C413FC0D08 for ; Wed, 16 Mar 2016 21:20:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id rv9qMvmgOrI1 for ; Wed, 16 Mar 2016 21:20:33 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id B94335F56B for ; Wed, 16 Mar 2016 21:20:32 +0000 (UTC) Received: from svn01-us-west.apache.org (svn.apache.org [10.41.0.6]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id ECB4AE0054 for ; Wed, 16 Mar 2016 21:20:31 +0000 (UTC) Received: from svn01-us-west.apache.org (localhost [127.0.0.1]) by svn01-us-west.apache.org (ASF Mail Server at svn01-us-west.apache.org) with ESMTP id E89FF3A0184 for ; Wed, 16 Mar 2016 21:20:31 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1735302 [1/2] - in /ctakes/trunk/ctakes-temporal: ./ src/main/java/org/apache/ctakes/temporal/ae/ src/main/java/org/apache/ctakes/temporal/ae/feature/ src/main/java/org/apache/ctakes/temporal/duration/ src/main/java/org/apache/ctakes/tempo... Date: Wed, 16 Mar 2016 21:20:29 -0000 To: commits@ctakes.apache.org From: clin@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20160316212031.E89FF3A0184@svn01-us-west.apache.org> Author: clin Date: Wed Mar 16 21:20:21 2016 New Revision: 1735302 URL: http://svn.apache.org/viewvc?rev=1735302&view=rev Log: add word embedding features for DocTimeRel annotator. change back to old sentence detector. add within sentence Relation annotator, which will leverage on normalized timex and their relations Added: ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/WithinSentenceBeforeRelationAnnotator.java ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/ContinuousTextExtractor.java ctakes/trunk/ctakes-temporal/src/main/resources/org/apache/ctakes/temporal/mimic_vectors.txt Modified: ctakes/trunk/ctakes-temporal/pom.xml ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/DocTimeRelAnnotator.java ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/duration/Utils.java ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/EvaluationOfBothEEAndETRelations.java ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/Evaluation_ImplBase.java Modified: ctakes/trunk/ctakes-temporal/pom.xml URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/pom.xml?rev=1735302&r1=1735301&r2=1735302&view=diff ============================================================================== --- ctakes/trunk/ctakes-temporal/pom.xml (original) +++ ctakes/trunk/ctakes-temporal/pom.xml Wed Mar 16 21:20:21 2016 @@ -143,11 +143,11 @@ scala-library 2.11.7 - + Modified: ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/DocTimeRelAnnotator.java URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/DocTimeRelAnnotator.java?rev=1735302&r1=1735301&r2=1735302&view=diff ============================================================================== --- ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/DocTimeRelAnnotator.java (original) +++ ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/DocTimeRelAnnotator.java Wed Mar 16 21:20:21 2016 @@ -26,6 +26,7 @@ import java.util.List; import java.util.Map; import org.apache.ctakes.temporal.ae.feature.ClosestVerbExtractor; +import org.apache.ctakes.temporal.ae.feature.ContinuousTextExtractor; import org.apache.ctakes.temporal.ae.feature.EventPropertyExtractor; import org.apache.ctakes.temporal.ae.feature.NearbyVerbTenseXExtractor; import org.apache.ctakes.temporal.ae.feature.SectionHeaderExtractor; @@ -55,6 +56,7 @@ import org.cleartk.ml.feature.extractor. import org.cleartk.ml.feature.extractor.CleartkExtractor.Covered; import org.cleartk.ml.feature.extractor.CleartkExtractor.Following; import org.cleartk.ml.feature.extractor.CleartkExtractor.Preceding; +import org.cleartk.ml.feature.extractor.CleartkExtractorException; import org.cleartk.ml.feature.extractor.CombinedExtractor1; import org.cleartk.ml.feature.extractor.CoveredTextExtractor; import org.cleartk.ml.feature.extractor.TypePathExtractor; @@ -66,161 +68,179 @@ import org.cleartk.ml.jar.GenericJarClas public class DocTimeRelAnnotator extends CleartkAnnotator { - public static AnalysisEngineDescription createDataWriterDescription( - Class> dataWriterClass, - File outputDirectory) throws ResourceInitializationException { - return AnalysisEngineFactory.createEngineDescription( - DocTimeRelAnnotator.class, - CleartkAnnotator.PARAM_IS_TRAINING, - true, - DefaultDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME, - dataWriterClass, - DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY, - outputDirectory); - } - - public static AnalysisEngineDescription createAnnotatorDescription(String modelPath) - throws ResourceInitializationException { - return AnalysisEngineFactory.createEngineDescription( - DocTimeRelAnnotator.class, - CleartkAnnotator.PARAM_IS_TRAINING, - false, - GenericJarClassifierFactory.PARAM_CLASSIFIER_JAR_PATH, - modelPath); - } - - /** - * @deprecated use String path instead of File. - * ClearTK will automatically Resolve the String to an InputStream. - * This will allow resources to be read within from a jar as well as File. - */ - public static AnalysisEngineDescription createAnnotatorDescription(File modelDirectory) - throws ResourceInitializationException { - return AnalysisEngineFactory.createEngineDescription( - DocTimeRelAnnotator.class, - CleartkAnnotator.PARAM_IS_TRAINING, - false, - GenericJarClassifierFactory.PARAM_CLASSIFIER_JAR_PATH, - new File(modelDirectory, "model.jar")); - } - - private CleartkExtractor contextExtractor; - private SectionHeaderExtractor sectionIDExtractor; - private ClosestVerbExtractor closestVerbExtractor; - private TimeXExtractor timeXExtractor; - private EventPropertyExtractor genericExtractor; - private UmlsSingleFeatureExtractor umlsExtractor; - private NearbyVerbTenseXExtractor verbTensePatternExtractor; - -// private DateAndMeasurementExtractor dateExtractor; -// private CoveredTextToValuesExtractor disSemExtractor; -// private DurationExpectationFeatureExtractor durationExtractor; - - public static final String PARAM_PROB_VIEW = "ProbView"; - @ConfigurationParameter(name=PARAM_PROB_VIEW, mandatory=false) - private String probViewname = null; - - @Override - public void initialize(UimaContext context) throws ResourceInitializationException { - super.initialize(context); - CombinedExtractor1 baseExtractor = new CombinedExtractor1( - new CoveredTextExtractor(), - new TypePathExtractor(BaseToken.class, "partOfSpeech")); - this.contextExtractor = new CleartkExtractor( - BaseToken.class, - baseExtractor, - new Preceding(3), - new Covered(), - new Following(3)); - this.sectionIDExtractor = new SectionHeaderExtractor(); - this.closestVerbExtractor = new ClosestVerbExtractor(); - this.timeXExtractor = new TimeXExtractor(); - this.genericExtractor = new EventPropertyExtractor(); - this.umlsExtractor = new UmlsSingleFeatureExtractor(); - this.verbTensePatternExtractor = new NearbyVerbTenseXExtractor(); - -// this.dateExtractor = new DateAndMeasurementExtractor(); - -// try { -// Map word_disSem = CoveredTextToValuesExtractor.parseTextDoublesMap(new File("src/main/resources/embeddings.size25.txt"), Charsets.UTF_8); -// this.disSemExtractor = new CoveredTextToValuesExtractor("DisSemFeat", word_disSem); -// } catch (IOException e) { -// e.printStackTrace(); -// } -// this.durationExtractor = new DurationExpectationFeatureExtractor(); - } - - @Override - public void process(JCas jCas) throws AnalysisEngineProcessException { - for (EventMention eventMention : JCasUtil.select(jCas, EventMention.class)) { - List sents = JCasUtil.selectCovering(jCas, Sentence.class, eventMention); - List features = new ArrayList<>(); - if(sents!=null && sents.size()>0){ - features.addAll(this.contextExtractor.extractWithin(jCas, eventMention, sents.get(0))); - }else{ - features.addAll(this.contextExtractor.extract(jCas, eventMention)); - } - - features.addAll(this.sectionIDExtractor.extract(jCas, eventMention)); //add section heading - features.addAll(this.closestVerbExtractor.extract(jCas, eventMention)); //add closest verb - features.addAll(this.timeXExtractor.extract(jCas, eventMention)); //add the closest time expression types - features.addAll(this.genericExtractor.extract(jCas, eventMention)); //add the closest time expression types - features.addAll(this.umlsExtractor.extract(jCas, eventMention)); //add umls features - features.addAll(this.verbTensePatternExtractor.extract(jCas, eventMention));//add nearby verb POS pattern feature - - // - // features.addAll(this.dateExtractor.extract(jCas, eventMention)); //add the closest NE type - // features.addAll(this.durationExtractor.extract(jCas, eventMention)); //add duration feature - // features.addAll(this.disSemExtractor.extract(jCas, eventMention)); //add distributional semantic features - if (this.isTraining()) { - if(eventMention.getEvent() != null){ - String outcome = eventMention.getEvent().getProperties().getDocTimeRel(); - this.dataWriter.write(new Instance<>(outcome, features)); - } - } else { -// String outcome = this.classifier.classify(features); - Map scores = this.classifier.score(features); - Map.Entry maxEntry = null; - for( Map.Entry entry: scores.entrySet() ){ - if(maxEntry == null || entry.getValue().compareTo(maxEntry.getValue()) > 0){ - maxEntry = entry; - } - } - - if (probViewname != null){ - Map probs = SoftMaxUtil.getDistributionFromScores(scores); - try { - JCas probView = jCas.getView(probViewname); - for(String label : probs.keySet()){ - EventMention mention = new EventMention(probView); - mention.setId(eventMention.getId()); - mention.setConfidence(probs.get(label).floatValue()); - Event event = new Event(probView); - EventProperties props = new EventProperties(probView); - props.setDocTimeRel(label); - event.setProperties(props); - mention.setEvent(event); - mention.addToIndexes(); - } - } catch (CASException e) { - e.printStackTrace(); - throw new AnalysisEngineProcessException(e); - } - - } - - if (eventMention.getEvent() == null) { - Event event = new Event(jCas); - eventMention.setEvent(event); - EventProperties props = new EventProperties(jCas); - event.setProperties(props); - } - if( maxEntry != null){ - eventMention.getEvent().getProperties().setDocTimeRel(maxEntry.getKey()); - eventMention.getEvent().setConfidence(maxEntry.getValue().floatValue()); -// System.out.println("event DocTimeRel confidence:"+maxEntry.getValue().floatValue()); - } - } - } - } + public static AnalysisEngineDescription createDataWriterDescription( + Class> dataWriterClass, + File outputDirectory) throws ResourceInitializationException { + return AnalysisEngineFactory.createEngineDescription( + DocTimeRelAnnotator.class, + CleartkAnnotator.PARAM_IS_TRAINING, + true, + DefaultDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME, + dataWriterClass, + DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY, + outputDirectory); + } + + public static AnalysisEngineDescription createAnnotatorDescription(String modelPath) + throws ResourceInitializationException { + return AnalysisEngineFactory.createEngineDescription( + DocTimeRelAnnotator.class, + CleartkAnnotator.PARAM_IS_TRAINING, + false, + GenericJarClassifierFactory.PARAM_CLASSIFIER_JAR_PATH, + modelPath); + } + + /** + * @deprecated use String path instead of File. + * ClearTK will automatically Resolve the String to an InputStream. + * This will allow resources to be read within from a jar as well as File. + */ + @Deprecated + public static AnalysisEngineDescription createAnnotatorDescription(File modelDirectory) + throws ResourceInitializationException { + return AnalysisEngineFactory.createEngineDescription( + DocTimeRelAnnotator.class, + CleartkAnnotator.PARAM_IS_TRAINING, + false, + GenericJarClassifierFactory.PARAM_CLASSIFIER_JAR_PATH, + new File(modelDirectory, "model.jar")); + } + + private CleartkExtractor contextExtractor; + private CleartkExtractor tokenVectorContext; + private ContinuousTextExtractor continuousText; + private SectionHeaderExtractor sectionIDExtractor; + private ClosestVerbExtractor closestVerbExtractor; + private TimeXExtractor timeXExtractor; + private EventPropertyExtractor genericExtractor; + private UmlsSingleFeatureExtractor umlsExtractor; + private NearbyVerbTenseXExtractor verbTensePatternExtractor; + + // private DateAndMeasurementExtractor dateExtractor; + // private CoveredTextToValuesExtractor disSemExtractor; + // private DurationExpectationFeatureExtractor durationExtractor; + + public static final String PARAM_PROB_VIEW = "ProbView"; + @ConfigurationParameter(name=PARAM_PROB_VIEW, mandatory=false) + private String probViewname = null; + + @Override + public void initialize(UimaContext context) throws ResourceInitializationException { + super.initialize(context); + CombinedExtractor1 baseExtractor = new CombinedExtractor1<>( + new CoveredTextExtractor(), + new TypePathExtractor<>(BaseToken.class, "partOfSpeech")); + this.contextExtractor = new CleartkExtractor<>( + BaseToken.class, + baseExtractor, + new Preceding(3), + new Covered(), + new Following(3)); + final String vectorFile = "org/apache/ctakes/temporal/mimic_vectors.txt"; + try { + this.continuousText = new ContinuousTextExtractor(vectorFile); + } catch (CleartkExtractorException e) { + System.err.println("cannot find file: "+ vectorFile); + e.printStackTrace(); + } + this.tokenVectorContext = new CleartkExtractor<>( + BaseToken.class, + continuousText, + //new Preceding(5), + new Covered()); + //new Following(5)); + this.sectionIDExtractor = new SectionHeaderExtractor(); + this.closestVerbExtractor = new ClosestVerbExtractor(); + this.timeXExtractor = new TimeXExtractor(); + this.genericExtractor = new EventPropertyExtractor(); + this.umlsExtractor = new UmlsSingleFeatureExtractor(); + this.verbTensePatternExtractor = new NearbyVerbTenseXExtractor(); + + // this.dateExtractor = new DateAndMeasurementExtractor(); + + // try { + // Map word_disSem = CoveredTextToValuesExtractor.parseTextDoublesMap(new File("src/main/resources/embeddings.size25.txt"), Charsets.UTF_8); + // this.disSemExtractor = new CoveredTextToValuesExtractor("DisSemFeat", word_disSem); + // } catch (IOException e) { + // e.printStackTrace(); + // } + // this.durationExtractor = new DurationExpectationFeatureExtractor(); + } + + @Override + public void process(JCas jCas) throws AnalysisEngineProcessException { + for (EventMention eventMention : JCasUtil.select(jCas, EventMention.class)) { + List sents = JCasUtil.selectCovering(jCas, Sentence.class, eventMention); + List features = new ArrayList<>(); + if(sents!=null && sents.size()>0){ + features.addAll(this.contextExtractor.extractWithin(jCas, eventMention, sents.get(0))); + features.addAll(this.tokenVectorContext.extractWithin(jCas, eventMention, sents.get(0))); + }else{ + features.addAll(this.contextExtractor.extract(jCas, eventMention)); + features.addAll(this.tokenVectorContext.extract(jCas, eventMention)); + } + + features.addAll(this.sectionIDExtractor.extract(jCas, eventMention)); //add section heading + features.addAll(this.closestVerbExtractor.extract(jCas, eventMention)); //add closest verb + features.addAll(this.timeXExtractor.extract(jCas, eventMention)); //add the closest time expression types + features.addAll(this.genericExtractor.extract(jCas, eventMention)); //add the closest time expression types + features.addAll(this.umlsExtractor.extract(jCas, eventMention)); //add umls features + features.addAll(this.verbTensePatternExtractor.extract(jCas, eventMention));//add nearby verb POS pattern feature + + // + // features.addAll(this.dateExtractor.extract(jCas, eventMention)); //add the closest NE type + // features.addAll(this.durationExtractor.extract(jCas, eventMention)); //add duration feature + // features.addAll(this.disSemExtractor.extract(jCas, eventMention)); //add distributional semantic features + if (this.isTraining()) { + if(eventMention.getEvent() != null){ + String outcome = eventMention.getEvent().getProperties().getDocTimeRel(); + this.dataWriter.write(new Instance<>(outcome, features)); + } + } else { + // String outcome = this.classifier.classify(features); + Map scores = this.classifier.score(features); + Map.Entry maxEntry = null; + for( Map.Entry entry: scores.entrySet() ){ + if(maxEntry == null || entry.getValue().compareTo(maxEntry.getValue()) > 0){ + maxEntry = entry; + } + } + + if (probViewname != null){ + Map probs = SoftMaxUtil.getDistributionFromScores(scores); + try { + JCas probView = jCas.getView(probViewname); + for(String label : probs.keySet()){ + EventMention mention = new EventMention(probView); + mention.setId(eventMention.getId()); + mention.setConfidence(probs.get(label).floatValue()); + Event event = new Event(probView); + EventProperties props = new EventProperties(probView); + props.setDocTimeRel(label); + event.setProperties(props); + mention.setEvent(event); + mention.addToIndexes(); + } + } catch (CASException e) { + e.printStackTrace(); + throw new AnalysisEngineProcessException(e); + } + + } + + if (eventMention.getEvent() == null) { + Event event = new Event(jCas); + eventMention.setEvent(event); + EventProperties props = new EventProperties(jCas); + event.setProperties(props); + } + if( maxEntry != null){ + eventMention.getEvent().getProperties().setDocTimeRel(maxEntry.getKey()); + eventMention.getEvent().setConfidence(maxEntry.getValue().floatValue()); + // System.out.println("event DocTimeRel confidence:"+maxEntry.getValue().floatValue()); + } + } + } + } } Added: ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/WithinSentenceBeforeRelationAnnotator.java URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/WithinSentenceBeforeRelationAnnotator.java?rev=1735302&view=auto ============================================================================== --- ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/WithinSentenceBeforeRelationAnnotator.java (added) +++ ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/WithinSentenceBeforeRelationAnnotator.java Wed Mar 16 21:20:21 2016 @@ -0,0 +1,159 @@ +package org.apache.ctakes.temporal.ae; + +import java.sql.Timestamp; +import java.text.SimpleDateFormat; +import java.util.ArrayList; +import java.util.Date; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.ctakes.temporal.duration.Utils; +import org.apache.ctakes.typesystem.type.relation.RelationArgument; +import org.apache.ctakes.typesystem.type.relation.TemporalTextRelation; +import org.apache.ctakes.typesystem.type.syntax.NewlineToken; +import org.apache.ctakes.typesystem.type.textsem.EventMention; +import org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation; +import org.apache.ctakes.typesystem.type.textsem.TimeMention; +import org.apache.uima.analysis_engine.AnalysisEngineProcessException; +import org.apache.uima.fit.component.JCasAnnotator_ImplBase; +import org.apache.uima.fit.util.JCasUtil; +import org.apache.uima.jcas.JCas; +import org.apache.uima.jcas.tcas.Annotation; + +public class WithinSentenceBeforeRelationAnnotator extends JCasAnnotator_ImplBase { + + @Override + public void process(JCas jCas) throws AnalysisEngineProcessException { + //1. find all timex that can be normalized to timestamp. form a map of timex-timestamp + Map timeNorm = new HashMap<>(); + //find docTime: + TimeMention docTime = null; + for (TimeMention timex : JCasUtil.select(jCas, TimeMention.class)) { + if(timex.getTimeClass().equals("DOCTIME")){ + docTime = timex; + break; + } + } + if(docTime != null){ + for (TimeMention timex : JCasUtil.select(jCas, TimeMention.class)) { + if(!timex.getTimeClass().equals("DOCTIME")){ + //add normalized timex + String value = Utils.getTimexMLValue(timex.getCoveredText(), docTime.getCoveredText()); + if(value != null){ + try{ + SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd"); + Date parsedDate = dateFormat.parse(value); + Timestamp timestamp = new Timestamp(parsedDate.getTime()); + timeNorm.put(timex, timestamp); + }catch(Exception e){//this generic but you can control another types of exception + System.out.println("cannot parse timex :" + value); + continue; + } + } + } + } + } + + //find all timex that involved in temporal relations: + Set relationalTimex = new HashSet<>(); + for(RelationArgument relarg: JCasUtil.select(jCas, RelationArgument.class)){ + Annotation arg = relarg.getArgument(); + if(arg instanceof TimeMention){ + relationalTimex.add((TimeMention) arg); + } + } + relationalTimex.retainAll(timeNorm.keySet()); + + //find all events that contained by those timex + Map> timeEvents = new HashMap<>(); + for(TimeMention time : relationalTimex){ + Set containedEvents = new HashSet<>(); + for (TemporalTextRelation rel : JCasUtil.select(jCas, TemporalTextRelation.class)){ + if(rel.getCategory().equals("CONTAINS")){ + if(rel.getArg1().getArgument()==time){ + containedEvents.add((EventMention)rel.getArg2().getArgument()); + } + }else if(rel.getCategory().equals("CONTAINS-1")){ + if(rel.getArg2().getArgument()==time){ + containedEvents.add((EventMention)rel.getArg1().getArgument()); + } + } + } + timeEvents.put(time, containedEvents); + } + + //iterate the List of timx, find all pairs of timex that are in the same line -- i.e. there is no newLine in between + int timexNum = relationalTimex.size(); + List timexLst = new ArrayList<>(relationalTimex); + for (int i=0; i< timexNum-1; i++){ + TimeMention timeA = timexLst.get(i); + for(int j= i+1; j < timexNum; j++){ + TimeMention timeB = timexLst.get(j); + //check if timeA and timeB are in the same line: + if(timeA!=timeB && JCasUtil.selectBetween(jCas, NewlineToken.class, timeA, timeB).isEmpty()){ + Timestamp stampA = timeNorm.get(timeA); + Timestamp stampB = timeNorm.get(timeB); + int compareResult =stampA.compareTo(stampB); + if(compareResult<0){//if before + for(EventMention eventA : timeEvents.get(timeA)){ + for(EventMention eventB: timeEvents.get(timeB)){ + if(eventA != eventB){ + createRelation(jCas, eventA, eventB, "BEFORE", 1d); + } + } + } + + }else if(compareResult>0){//if after + for(EventMention eventB : timeEvents.get(timeB)){ + for(EventMention eventA: timeEvents.get(timeA)){ + if(eventA != eventB){ + createRelation(jCas, eventB, eventA, "BEFORE", 1d); + } + } + } + + }else{//if they are equal + Set groupA = new HashSet<>(); + groupA.addAll(timeEvents.get(timeA)); + groupA.removeAll(timeEvents.get(timeB)); + for(EventMention event: groupA){ + createRelation(jCas, timeB, event, "CONTAINS", 1d); + } + Set groupB = new HashSet<>(); + groupB.addAll(timeEvents.get(timeB)); + groupB.removeAll(timeEvents.get(timeA)); + for(EventMention event: groupB){ + createRelation(jCas, timeA, event, "CONTAINS", 1d); + } + + } + + } + } + } + + //3. if so find the relationship between those two timestamp, then induce the relations of covered events. + } + + protected void createRelation(JCas jCas, IdentifiedAnnotation arg1, + IdentifiedAnnotation arg2, String predictedCategory, double confidence) { + RelationArgument relArg1 = new RelationArgument(jCas); + relArg1.setArgument(arg1); + relArg1.setRole("Arg1"); + relArg1.addToIndexes(); + RelationArgument relArg2 = new RelationArgument(jCas); + relArg2.setArgument(arg2); + relArg2.setRole("Arg2"); + relArg2.addToIndexes(); + TemporalTextRelation relation = new TemporalTextRelation(jCas); + relation.setArg1(relArg1); + relation.setArg2(relArg2); + relation.setCategory(predictedCategory); + relation.setConfidence(confidence); + relation.addToIndexes(); + } + +} Added: ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/ContinuousTextExtractor.java URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/ContinuousTextExtractor.java?rev=1735302&view=auto ============================================================================== --- ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/ContinuousTextExtractor.java (added) +++ ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/ContinuousTextExtractor.java Wed Mar 16 21:20:21 2016 @@ -0,0 +1,63 @@ +package org.apache.ctakes.temporal.ae.feature; + +import java.io.File; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Scanner; + +import org.apache.ctakes.core.resource.FileLocator; +import org.apache.ctakes.typesystem.type.syntax.BaseToken; +import org.apache.ctakes.utils.distsem.WordEmbeddings; +import org.apache.ctakes.utils.distsem.WordVector; +import org.apache.ctakes.utils.distsem.WordVectorReader; +import org.apache.uima.jcas.JCas; +import org.cleartk.ml.Feature; +import org.cleartk.ml.feature.extractor.CleartkExtractorException; +import org.cleartk.ml.feature.extractor.NamedFeatureExtractor1; + +public class ContinuousTextExtractor implements +NamedFeatureExtractor1 { + private int dims; + private WordEmbeddings words = null; + public ContinuousTextExtractor(String vecFile) throws + CleartkExtractorException { + super(); + try { + words = + WordVectorReader.getEmbeddings(FileLocator.getAsStream(vecFile)); + } catch (IOException e) { + e.printStackTrace(); + throw new CleartkExtractorException(e); + } + } + @Override + public List extract(JCas view, BaseToken token) throws + CleartkExtractorException { + List feats = new ArrayList<>(); + + String wordText = token.getCoveredText(); + WordVector vec = null; + if(words.containsKey(wordText)){ + vec = words.getVector(wordText); + }else if(words.containsKey(wordText.toLowerCase())){ + vec = words.getVector(wordText.toLowerCase()); + }else{ + return feats; + } + + for(int i = 0; i < vec.size(); i++){ + feats.add(new Feature(getFeatureName() + "_" + i, vec.getValue(i))); + } + return feats; + } + + @Override + public String getFeatureName() { + return "ContinuousText"; + } + +} Modified: ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/duration/Utils.java URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/duration/Utils.java?rev=1735302&r1=1735301&r2=1735302&view=diff ============================================================================== --- ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/duration/Utils.java (original) +++ ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/duration/Utils.java Wed Mar 16 21:20:21 2016 @@ -133,6 +133,47 @@ public class Utils { } /** + * Use Bethard normalizer to get TimeML value. + */ + public static String getTimexMLValue(String timex) { + + URL grammarURL = DurationEventTimeFeatureExtractor.class.getResource("/info/bethard/timenorm/en.grammar"); + TemporalExpressionParser parser = new TemporalExpressionParser(grammarURL, DefaultTokenizer$.MODULE$); + TimeSpan anchor = TimeSpan.of(2013, 12, 16); + Try result = parser.parse(timex, anchor); + + String value = null; + if (result.isSuccess()) { + Temporal temporal = result.get(); + + value = temporal.timeMLValue(); + } + + return value; + } + + /** + * Use Bethard normalizer to get TimeML value. + */ + public static String getTimexMLValue(String timex, String anchorStr) { + + String anchstr = getTimexMLValue(anchorStr); + URL grammarURL = DurationEventTimeFeatureExtractor.class.getResource("/info/bethard/timenorm/en.grammar"); + TemporalExpressionParser parser = new TemporalExpressionParser(grammarURL, DefaultTokenizer$.MODULE$); + TimeSpan anchor = TimeSpan.fromTimeMLValue(anchstr);//.of(2013, 12, 16); + Try result = parser.parse(timex, anchor); + + String value = null; + if (result.isSuccess()) { + Temporal temporal = result.get(); + + value = temporal.timeMLValue(); + } + + return value; + } + + /** * Take the time unit from Bethard noramlizer * and return a coarser time unit, i.e. one of the eight bins. * Return null, if this cannot be done. Modified: ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/EvaluationOfBothEEAndETRelations.java URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/EvaluationOfBothEEAndETRelations.java?rev=1735302&r1=1735301&r2=1735302&view=diff ============================================================================== --- ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/EvaluationOfBothEEAndETRelations.java (original) +++ ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/EvaluationOfBothEEAndETRelations.java Wed Mar 16 21:20:21 2016 @@ -34,14 +34,18 @@ import java.util.Map; import java.util.Set; import org.apache.ctakes.relationextractor.eval.RelationExtractorEvaluation.HashableArguments; +import org.apache.ctakes.temporal.ae.CrossSentenceTemporalRelationAnnotator; import org.apache.ctakes.temporal.ae.DocTimeRelAnnotator; import org.apache.ctakes.temporal.ae.EventEventRelationAnnotator; import org.apache.ctakes.temporal.ae.EventTimeSelfRelationAnnotator; import org.apache.ctakes.temporal.ae.TemporalRelationExtractorAnnotator; +import org.apache.ctakes.temporal.ae.WithinSentenceBeforeRelationAnnotator; //import org.apache.ctakes.temporal.ae.EventTimeSyntacticAnnotator; //import org.apache.ctakes.temporal.ae.EventTimeRelationAnnotator; //import org.apache.ctakes.temporal.ae.EventEventRelationAnnotator; import org.apache.ctakes.temporal.ae.baselines.RecallBaselineEventTimeRelationAnnotator; +import org.apache.ctakes.temporal.eval.EvaluationOfEventEventThymeRelations.AddEEPotentialRelations; +import org.apache.ctakes.temporal.eval.EvaluationOfEventTimeRelations.AddPotentialRelations; import org.apache.ctakes.temporal.eval.EvaluationOfEventTimeRelations.Overlap2Contains; import org.apache.ctakes.temporal.eval.EvaluationOfEventTimeRelations.ParameterSettings; //import org.apache.ctakes.temporal.eval.Evaluation_ImplBase.WriteI2B2XML; @@ -111,7 +115,7 @@ EvaluationOfTemporalRelations_ImplBase{ @Option public boolean getSkipTrain(); - + @Option public boolean getWriteProbabilities(); } @@ -199,28 +203,28 @@ EvaluationOfTemporalRelations_ImplBase{ evaluation.prepareXMIsFor(patientSets); } evaluation.writeProbabilities = options.getWriteProbabilities(); - + params.stats = evaluation.trainAndTest(training, testing);//training);// // System.err.println(options.getKernelParams() == null ? params : options.getKernelParams()); -// System.err.println("No closure on gold::Closure on System::Recall Mode"); + // System.err.println("No closure on gold::Closure on System::Recall Mode"); System.err.println(params.stats); //do closure on gold, but not on system, to calculate precision -// evaluation.skipTrain = true; -// recallModeEvaluation = false; -// params.stats = evaluation.trainAndTest(training, testing);//training);// -// // System.err.println(options.getKernelParams() == null ? params : options.getKernelParams()); -// System.err.println("No closure on System::Closure on Gold::Precision Mode"); -// System.err.println(params.stats); -// -// //do closure on train, but not on test, to calculate plain results -// evaluation.skipTrain = true; -// evaluation.useClosure = false; -// // evaluation.printErrors = false; -// params.stats = evaluation.trainAndTest(training, testing);//training);// -// // System.err.println(options.getKernelParams() == null ? params : options.getKernelParams()); -// System.err.println("Closure on train::No closure on Test::Plain Mode"); -// System.err.println(params.stats); + // evaluation.skipTrain = true; + // recallModeEvaluation = false; + // params.stats = evaluation.trainAndTest(training, testing);//training);// + // // System.err.println(options.getKernelParams() == null ? params : options.getKernelParams()); + // System.err.println("No closure on System::Closure on Gold::Precision Mode"); + // System.err.println(params.stats); + // + // //do closure on train, but not on test, to calculate plain results + // evaluation.skipTrain = true; + // evaluation.useClosure = false; + // // evaluation.printErrors = false; + // params.stats = evaluation.trainAndTest(training, testing);//training);// + // // System.err.println(options.getKernelParams() == null ? params : options.getKernelParams()); + // System.err.println("Closure on train::No closure on Test::Plain Mode"); + // System.err.println(params.stats); if(options.getUseTmp()){ // won't work because it's not empty. should we be concerned with this or is it responsibility of @@ -239,7 +243,7 @@ EvaluationOfTemporalRelations_ImplBase{ protected boolean useGoldAttributes; protected boolean skipTrain=false; // protected boolean printRelations = false; - private boolean writeProbabilities = false; + private boolean writeProbabilities = false; public EvaluationOfBothEEAndETRelations( File baseDirectory, @@ -303,15 +307,15 @@ EvaluationOfTemporalRelations_ImplBase{ // aggregateBuilder.add(AnalysisEngineFactory.createPrimitiveDescription(AddFlippedOverlap.class));//add flipped overlap instances to training data aggregateBuilder.add(AnalysisEngineFactory.createPrimitiveDescription(RemoveNonTLINKRelations.class));//remove non tlink relations, such as alinks - + aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(Overlap2Contains.class)); // aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(PreserveEventEventRelations.class)); // aggregateBuilder.add(AnalysisEngineFactory.createPrimitiveDescription(RemoveNonUMLSEvents.class)); //add unlabeled nearby system events as potential links: -// aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(AddEEPotentialRelations.class)); -// aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(AddPotentialRelations.class)); + aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(AddEEPotentialRelations.class)); + aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(AddPotentialRelations.class)); aggregateBuilder.add(EventEventRelationAnnotator.createDataWriterDescription( LibLinearStringOutcomeDataWriter.class, @@ -349,8 +353,8 @@ EvaluationOfTemporalRelations_ImplBase{ } // HideOutput hider = new HideOutput(); - JarClassifierBuilder.trainAndPackage(new File(directory,"event-event"),"-w2","10","-w3","86","-c", "0.003");//"-c", "0.05");//"0.08","-w3","3","-w4","17","-w5","20","-w6","16","-w7","10","-w8","6", "-w9","45","-w10","30","-c", optArray[1]);//"-c", "0.05");//optArray); - JarClassifierBuilder.trainAndPackage(new File(directory,"event-time"), "-w3","2","-w4","19","-w5","13","-w6","22","-w7","96","-w8","18","-c", "0.0007"); //"-w3","2","-w4","19","-w5","13","-w6","22","-w7","96","-w8","18","-c", optArray[1]);//"-w4","18","-w5","14","-w6","21","-w7","100","-w8","19","-c", optArray[1]);//"0.05");//"-h","0","-c", "1000");//optArray); + JarClassifierBuilder.trainAndPackage(new File(directory,"event-event"),"-w2","10","-w3","86","-c", optArray[1]);//"-c", "0.05");//"0.08","-w3","3","-w4","17","-w5","20","-w6","16","-w7","10","-w8","6", "-w9","45","-w10","30","-c", optArray[1]);//"-c", "0.05");//optArray); + JarClassifierBuilder.trainAndPackage(new File(directory,"event-time"), "-w3","2","-w4","19","-w5","13","-w6","22","-w7","96","-w8","18","-c", optArray[1]); //"-w3","2","-w4","19","-w5","13","-w6","22","-w7","96","-w8","18","-c", optArray[1]);//"-w4","18","-w5","14","-w6","21","-w7","100","-w8","19","-c", optArray[1]);//"0.05");//"-h","0","-c", "1000");//optArray); // JarClassifierBuilder.trainAndPackage(new File(directory,"event-event"), "-h","0","-c", "1000"); // hider.restoreOutput(); @@ -362,10 +366,10 @@ EvaluationOfTemporalRelations_ImplBase{ throws Exception { this.useClosure=false; AggregateBuilder aggregateBuilder = this.getPreprocessorAggregateBuilder(); - aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription( - ViewCreatorAnnotator.class, - ViewCreatorAnnotator.PARAM_VIEW_NAME, - PROB_VIEW_NAME ) ); + aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription( + ViewCreatorAnnotator.class, + ViewCreatorAnnotator.PARAM_VIEW_NAME, + PROB_VIEW_NAME ) ); aggregateBuilder.add(CopyFromGold.getDescription(EventMention.class, TimeMention.class)); @@ -381,13 +385,13 @@ EvaluationOfTemporalRelations_ImplBase{ // AnalysisEngineFactory.createEngineDescription(PreserveEventEventRelations.class), // CAS.NAME_DEFAULT_SOFA, // GOLD_VIEW_NAME); - + //remove non-tlink relations, such as alinks aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription(RemoveNonTLINKRelations.class), CAS.NAME_DEFAULT_SOFA, GOLD_VIEW_NAME); - + // aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(RemoveNonUMLSEvents.class)); if (!recallModeEvaluation && this.useClosure) { //closure for gold @@ -396,46 +400,49 @@ EvaluationOfTemporalRelations_ImplBase{ CAS.NAME_DEFAULT_SOFA, GOLD_VIEW_NAME); } - + aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(RemoveNonContainsRelations.class), CAS.NAME_DEFAULT_SOFA, GOLD_VIEW_NAME); aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(RemoveRelations.class)); AnalysisEngineDescription aed = this.baseline ? RecallBaselineEventTimeRelationAnnotator.createAnnotatorDescription(directory) : - EventEventRelationAnnotator.createAnnotatorDescription((new File(directory,"event-event/model.jar")).getAbsolutePath()); - if(this.writeProbabilities){ - ConfigurationParameterFactory.addConfigurationParameter(aed, - TemporalRelationExtractorAnnotator.PARAM_PROB_VIEW, - PROB_VIEW_NAME); - } + EventEventRelationAnnotator.createAnnotatorDescription((new File(directory,"event-event/model.jar")).getAbsolutePath()); + if(this.writeProbabilities){ + ConfigurationParameterFactory.addConfigurationParameter(aed, + TemporalRelationExtractorAnnotator.PARAM_PROB_VIEW, + PROB_VIEW_NAME); + } aggregateBuilder.add(aed); aed = EventTimeSelfRelationAnnotator.createEngineDescription(new File(directory,"event-time/model.jar").getAbsolutePath()); if(this.writeProbabilities){ - ConfigurationParameterFactory.addConfigurationParameter(aed, - TemporalRelationExtractorAnnotator.PARAM_PROB_VIEW, - PROB_VIEW_NAME); + ConfigurationParameterFactory.addConfigurationParameter(aed, + TemporalRelationExtractorAnnotator.PARAM_PROB_VIEW, + PROB_VIEW_NAME); } aggregateBuilder.add(aed); - + aed = DocTimeRelAnnotator.createAnnotatorDescription(new File("target/eval/event-properties/train_and_test/docTimeRel/model.jar").getAbsolutePath()); if(this.writeProbabilities){ - ConfigurationParameterFactory.addConfigurationParameters( - aed, - DocTimeRelAnnotator.PARAM_PROB_VIEW, - PROB_VIEW_NAME); + ConfigurationParameterFactory.addConfigurationParameters( + aed, + DocTimeRelAnnotator.PARAM_PROB_VIEW, + PROB_VIEW_NAME); } aggregateBuilder.add(aed); + + aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(CrossSentenceTemporalRelationAnnotator.class)); + aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(WithinSentenceBeforeRelationAnnotator.class)); if(this.anaforaOutput != null){ - aed = AnalysisEngineFactory.createEngineDescription(WriteAnaforaXML.class, WriteAnaforaXML.PARAM_OUTPUT_DIR, this.anaforaOutput); - if(this.writeProbabilities){ - ConfigurationParameterFactory.addConfigurationParameters( - aed, - WriteAnaforaXML.PARAM_PROB_VIEW, - PROB_VIEW_NAME); - } - aggregateBuilder.add(aed, "TimexView", CAS.NAME_DEFAULT_SOFA); + aed = AnalysisEngineFactory.createEngineDescription(WriteAnaforaXML.class, WriteAnaforaXML.PARAM_OUTPUT_DIR, this.anaforaOutput); + if(this.writeProbabilities){ + ConfigurationParameterFactory.addConfigurationParameters( + aed, + WriteAnaforaXML.PARAM_PROB_VIEW, + PROB_VIEW_NAME); + } + aggregateBuilder.add(aed, "TimexView", CAS.NAME_DEFAULT_SOFA); } File outf = null; @@ -572,7 +579,7 @@ EvaluationOfTemporalRelations_ImplBase{ } } - + static void createRelation(JCas jCas, Annotation arg1, Annotation arg2, String category) { RelationArgument relArg1 = new RelationArgument(jCas); Modified: ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/Evaluation_ImplBase.java URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/Evaluation_ImplBase.java?rev=1735302&r1=1735301&r2=1735302&view=diff ============================================================================== --- ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/Evaluation_ImplBase.java (original) +++ ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/Evaluation_ImplBase.java Wed Mar 16 21:20:21 2016 @@ -30,7 +30,7 @@ import org.apache.ctakes.contexttokenize import org.apache.ctakes.core.ae.OverlapAnnotator; import org.apache.ctakes.core.ae.SentenceDetector; import org.apache.ctakes.core.ae.TokenizerAnnotatorPTB; -import org.apache.ctakes.core.cleartk.ae.SentenceDetectorAnnotator; +//import org.apache.ctakes.core.cleartk.ae.SentenceDetectorAnnotator; import org.apache.ctakes.core.resource.FileLocator; import org.apache.ctakes.dependency.parser.ae.ClearNLPDependencyParserAE; import org.apache.ctakes.dependency.parser.ae.ClearNLPSemanticRoleLabelerAE; @@ -474,11 +474,11 @@ public abstract class Evaluation_ImplBas .add( AnalysisEngineFactory.createEngineDescription( SegmentsFromBracketedSectionTagsAnnotator.class ) ); // identify sentences -// aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription( -// SentenceDetector.class, -// SentenceDetector.SD_MODEL_FILE_PARAM, -// "org/apache/ctakes/core/sentdetect/sd-med-model.zip" ) ); - aggregateBuilder.add(SentenceDetectorAnnotator.getDescription(FileLocator.locateFile("org/apache/ctakes/core/sentdetect/model.jar").getPath())); + aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription( + SentenceDetector.class, + SentenceDetector.SD_MODEL_FILE_PARAM, + "org/apache/ctakes/core/sentdetect/sd-med-model.zip" ) ); +// aggregateBuilder.add(SentenceDetectorAnnotator.getDescription(FileLocator.locateFile("org/apache/ctakes/core/sentdetect/model.jar").getPath())); // identify tokens aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription( TokenizerAnnotatorPTB.class ) );