Return-Path: X-Original-To: apmail-ctakes-commits-archive@www.apache.org Delivered-To: apmail-ctakes-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D4F018845 for ; Thu, 10 Mar 2016 18:52:54 +0000 (UTC) Received: (qmail 45376 invoked by uid 500); 10 Mar 2016 18:52:54 -0000 Delivered-To: apmail-ctakes-commits-archive@ctakes.apache.org Received: (qmail 45343 invoked by uid 500); 10 Mar 2016 18:52:54 -0000 Mailing-List: contact commits-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list commits@ctakes.apache.org Received: (qmail 45334 invoked by uid 99); 10 Mar 2016 18:52:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Mar 2016 18:52:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 93AC5C035F for ; Thu, 10 Mar 2016 18:52:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.877 X-Spam-Level: * X-Spam-Status: No, score=1.877 tagged_above=-999 required=6.31 tests=[FILL_THIS_FORM_FRAUD_PHISH=0.396, KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.329, T_FILL_THIS_FORM_SHORT=0.01] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id xiR-hrwGghcW for ; Thu, 10 Mar 2016 18:52:50 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 47ABB5FAC8 for ; Thu, 10 Mar 2016 18:52:50 +0000 (UTC) Received: from svn01-us-west.apache.org (svn.apache.org [10.41.0.6]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A5619E0185 for ; Thu, 10 Mar 2016 18:52:49 +0000 (UTC) Received: from svn01-us-west.apache.org (localhost [127.0.0.1]) by svn01-us-west.apache.org (ASF Mail Server at svn01-us-west.apache.org) with ESMTP id AADAD3A0318 for ; Thu, 10 Mar 2016 18:52:49 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1734445 - in /ctakes/sandbox/ctakes-clinical-deid: ./ src/main/resources/META-INF/org.apache.uima.fit/ src/main/ruta/org/apache/ctakes/deid/ src/test/java/org/apache/ctakes/deid/ src/test/resources/org/apache/ctakes/deid/ Date: Thu, 10 Mar 2016 18:52:49 -0000 To: commits@ctakes.apache.org From: chenpei@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20160310185249.AADAD3A0318@svn01-us-west.apache.org> Author: chenpei Date: Thu Mar 10 18:52:48 2016 New Revision: 1734445 URL: http://svn.apache.org/viewvc?rev=1734445&view=rev Log: CTAKES-384 Applying patch.Thanks Peter Klugl. Modified: ctakes/sandbox/ctakes-clinical-deid/pom.xml ctakes/sandbox/ctakes-clinical-deid/src/main/resources/META-INF/org.apache.uima.fit/types.txt ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Deid.ruta ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Dictionaries.ruta ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/UserName.ruta ctakes/sandbox/ctakes-clinical-deid/src/test/java/org/apache/ctakes/deid/DeidPipelineTest.java ctakes/sandbox/ctakes-clinical-deid/src/test/resources/org/apache/ctakes/deid/examples.csv Modified: ctakes/sandbox/ctakes-clinical-deid/pom.xml URL: http://svn.apache.org/viewvc/ctakes/sandbox/ctakes-clinical-deid/pom.xml?rev=1734445&r1=1734444&r2=1734445&view=diff ============================================================================== --- ctakes/sandbox/ctakes-clinical-deid/pom.xml (original) +++ ctakes/sandbox/ctakes-clinical-deid/pom.xml Thu Mar 10 18:52:48 2016 @@ -11,24 +11,10 @@ + 2.8.1 2.4.0 - - - - - staged-release - https://repository.apache.org/content/repositories/orgapacheuima-1081/ - - - - - staged-release - https://repository.apache.org/content/repositories/orgapacheuima-1081/ - - - org.apache.ctakes @@ -45,9 +31,9 @@ ${ruta-version} - - - + + + @@ -60,7 +46,7 @@ target/generated-sources/ruta/descriptor - + org.apache.uima @@ -69,44 +55,40 @@ descriptors - - + + generate-resources generate - - + + ${basedir}/src/main/ruta **/*.ruta - + - + ${project.build.directory}/generated-sources/ruta/descriptor - + ${project.build.directory}/generated-sources/ruta/descriptor - + - + - + src/main/resources/template/BasicEngine.xml @@ -147,8 +129,8 @@ false - + -1 @@ -166,8 +148,8 @@ script:src/main/ruta/ - - + + @@ -192,6 +174,40 @@ + + org.codehaus.mojo + jaxb2-maven-plugin + 2.2 + + + xjc + + xjc + + + + + org.apache.ctakes.deid.i2b2 + + + + org.apache.uima + jcasgen-maven-plugin + ${uima-version} + + + + generate + + + + src/main/resources/org/apache/ctakes/deid/types/TypeSystem.xml + + true + + + + @@ -202,8 +218,8 @@ 2.8.1 - xml-apis - xml-apis + xml-apis + xml-apis 1.4.01 Modified: ctakes/sandbox/ctakes-clinical-deid/src/main/resources/META-INF/org.apache.uima.fit/types.txt URL: http://svn.apache.org/viewvc/ctakes/sandbox/ctakes-clinical-deid/src/main/resources/META-INF/org.apache.uima.fit/types.txt?rev=1734445&r1=1734444&r2=1734445&view=diff ============================================================================== --- ctakes/sandbox/ctakes-clinical-deid/src/main/resources/META-INF/org.apache.uima.fit/types.txt (original) +++ ctakes/sandbox/ctakes-clinical-deid/src/main/resources/META-INF/org.apache.uima.fit/types.txt Thu Mar 10 18:52:48 2016 @@ -7,4 +7,5 @@ classpath*:org/apache/ctakes/drugner/typ classpath*:org/apache/ctakes/padtermspotter/types/TypeSystem.xml classpath*:org/apache/ctakes/smokingstatus/types/TypeSystem.xml classpath*:org/apache/ctakes/sideeffect/types/TypeSystem.xml +classpath*:org/apache/ctakes/deid/types/TypeSystem.xml classpath*:org/apache/ctakes/deid/DeidRutaTypeSystem.xml \ No newline at end of file Modified: ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Deid.ruta URL: http://svn.apache.org/viewvc/ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Deid.ruta?rev=1734445&r1=1734444&r2=1734445&view=diff ============================================================================== --- ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Deid.ruta (original) +++ ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Deid.ruta Thu Mar 10 18:52:48 2016 @@ -1,22 +1,46 @@ PACKAGE org.apache.ctakes.deid; -TYPESYSTEM org.apache.ctakes.typesystem.types.TypeSystem; +//TYPESYSTEM org.apache.ctakes.typesystem.types.TypeSystem; +TYPESYSTEM org.apache.ctakes.deid.types.TypeSystem; + +// UIMA-4833 +TYPESYSTEM org.apache.ctakes.deid.ZipStateRutaTypeSystem; +TYPESYSTEM org.apache.ctakes.deid.StreetRutaTypeSystem; +TYPESYSTEM org.apache.ctakes.deid.AgeRutaTypeSystem; +TYPESYSTEM org.apache.ctakes.deid.DoctorRutaTypeSystem; +TYPESYSTEM org.apache.ctakes.deid.UserNameRutaTypeSystem; +TYPESYSTEM org.apache.ctakes.deid.PhoneRutaTypeSystem; +TYPESYSTEM org.apache.ctakes.deid.DateRutaTypeSystem; + SCRIPT org.apache.ctakes.deid.Dictionaries; +SCRIPT org.apache.ctakes.deid.Age; +SCRIPT org.apache.ctakes.deid.Doctor; SCRIPT org.apache.ctakes.deid.ZipState; SCRIPT org.apache.ctakes.deid.Street; SCRIPT org.apache.ctakes.deid.UserName; +SCRIPT org.apache.ctakes.deid.Phone; +SCRIPT org.apache.ctakes.deid.Date; CALL(Dictionaries); CALL(ZipState); CALL(Street); CALL(UserName); +CALL(Date); +CALL(Age); +CALL(Doctor); +CALL(Phone); + +Zip{-> Location, Location.entityType = "ZIP"}; +State{-> Location, Location.entityType= "STATE"}; +Email{-> Contact, Contact.entityType = "EMAIL"}; +ProfessionInd{-> Profession, Profession.entityType = "PROFESSION"}; +Url{-> Contact, Contact.entityType = "URL"}; +Street{-> Location, Location.entityType= "STREET"}; +UserName{-> Name, Name.entityType = "USERNAME"}; +Age{-> Age.entityType = "AGE"}; +Doctor{-> Name, Name.entityType = "DOCTOR"}; +Phone{-> Contact, Contact.entityType = "PHONE"}; +Date{-> Date.entityType = "DATE"}; + -// map types of ruta scripts to cTAKES types -// TODO select the correct types and fill the features -Zip{-> IdentifiedAnnotation}; -State{-> IdentifiedAnnotation}; -Email{-> IdentifiedAnnotation}; -Url{-> IdentifiedAnnotation}; -Street{-> IdentifiedAnnotation}; -UserName{-> IdentifiedAnnotation}; Modified: ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Dictionaries.ruta URL: http://svn.apache.org/viewvc/ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Dictionaries.ruta?rev=1734445&r1=1734444&r2=1734445&view=diff ============================================================================== --- ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Dictionaries.ruta (original) +++ ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/Dictionaries.ruta Thu Mar 10 18:52:48 2016 @@ -2,19 +2,49 @@ PACKAGE org.apache.ctakes.deid; WORDLIST trie = 'generated.mtwl'; DECLARE KeywordInd; -DECLARE KeywordInd Profession, StateContext; -DECLARE KeywordInd StreetInd, StreetFullInd; +DECLARE KeywordInd ProfessionInd, StateContext, DeceasedInd, FamilyInd, MonthInd; +DECLARE KeywordInd StreetInd, StreetFullInd, AgePostInd, AgePreInd, PhonePreInd; TRIE( - "profession.txt" = Profession, + "profession.txt" = ProfessionInd, "us_state.txt" = StateContext, "us_state_acronym_abbreviation.txt" = StateContext, "street_ind.txt" = StreetInd, "street_full_ind.txt" = StreetFullInd, + "age_post_ind.txt" = AgePostInd, + "age_pre_ind.txt" = AgePreInd, + "deceased_ind.txt" = DeceasedInd, + "family_ind" = FamilyInd, + "phone_pre_ind" = PhonePreInd, + "month_ind" = MonthInd, trie, true, 4, false, 0, "-"); - DECLARE Url, Email; "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9- ]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.(com|org|edu|gov|mil|co\\.uk))" -> Email; "(https?://)?(www.)([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?(/[a-zA-Z0-9]+)?|(https?|ftp)://[^\\s/$.?#].[^\\s]*|www.[^\\s/$.?#].[^\\s]*" -> Url; +DECLARE MDInd; +"M\\.D\\."-> MDInd; + +DECLARE Num1, Num12, Num2, Num3, Num34, Num4, Num5; + +NUM->{ + Document{REGEXP(".")-> Num1}; + Document{REGEXP("..?")-> Num12}; + Document{REGEXP("..")-> Num2}; + Document{REGEXP("...")-> Num3}; + Document{REGEXP("....?")-> Num34}; + Document{REGEXP("....")-> Num4}; + Document{REGEXP(".....")-> Num5}; +}; + +DECLARE LParen, RParen, Dash, Slash; +SPECIAL-> { + Document.ct=="("{-> LParen}; + Document.ct==")"{-> RParen}; + Document.ct=="-"{-> Dash}; + Document.ct=="/"{-> Slash}; +}; + +DECLARE ApoInd; +(SPECIAL.ct=="'" SW.ct=="s"){-> ApoInd}; \ No newline at end of file Modified: ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/UserName.ruta URL: http://svn.apache.org/viewvc/ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/UserName.ruta?rev=1734445&r1=1734444&r2=1734445&view=diff ============================================================================== --- ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/UserName.ruta (original) +++ ctakes/sandbox/ctakes-clinical-deid/src/main/ruta/org/apache/ctakes/deid/UserName.ruta Thu Mar 10 18:52:48 2016 @@ -1,4 +1,5 @@ PACKAGE org.apache.ctakes.deid; +TYPESYSTEM org.apache.ctakes.deid.DictionariesRutaTypeSystem; DECLARE UserName; //getUSERNAME 1 @@ -6,11 +7,7 @@ RETAINTYPE(WS); SPECIAL.ct=="[" (W{REGEXP(".{2,3}")} @NUM{REGEXP(".{1,3}")}){-> UserName} SPECIAL.ct=="]" ; +MDInd WS+ W{REGEXP(".{2}"), -REGEXP("[Oo]n")} NUM{REGEXP(".{1,3}")->MARK(UserName,3,4)}; RETAINTYPE; //getUSERNAME2 -DECLARE MDInd; -"M\\.D\\."-> MDInd; -MDInd W{REGEXP(".{2,3}")} NUM{REGEXP(".{1,3}")->MARK(UserName,2,3)}; -MDInd W{REGEXP("[Oo]n")} NUM{REGEXP(".{1,3}")->UNMARK(UserName,2,3)}; -W{REGEXP("[Oo]n")} @NUM{REGEXP(".{1,3}")->UNMARK(UserName,1,2)}; Modified: ctakes/sandbox/ctakes-clinical-deid/src/test/java/org/apache/ctakes/deid/DeidPipelineTest.java URL: http://svn.apache.org/viewvc/ctakes/sandbox/ctakes-clinical-deid/src/test/java/org/apache/ctakes/deid/DeidPipelineTest.java?rev=1734445&r1=1734444&r2=1734445&view=diff ============================================================================== --- ctakes/sandbox/ctakes-clinical-deid/src/test/java/org/apache/ctakes/deid/DeidPipelineTest.java (original) +++ ctakes/sandbox/ctakes-clinical-deid/src/test/java/org/apache/ctakes/deid/DeidPipelineTest.java Thu Mar 10 18:52:48 2016 @@ -23,9 +23,7 @@ import java.io.InputStreamReader; import java.net.URL; import java.util.Collection; -import junit.framework.Assert; - -import org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation; +import org.apache.ctakes.deid.type.DeidEntity; import org.apache.uima.fit.factory.AggregateBuilder; import org.apache.uima.fit.factory.AnalysisEngineFactory; import org.apache.uima.fit.factory.JCasFactory; @@ -34,6 +32,8 @@ import org.apache.uima.fit.util.JCasUtil import org.apache.uima.jcas.JCas; import org.junit.Test; +import junit.framework.Assert; + public class DeidPipelineTest { private String descriptorPath = "target/generated-sources/ruta/descriptor/org/apache/ctakes/deid/DeidRutaAnnotator.xml"; @@ -58,11 +58,11 @@ public class DeidPipelineTest { jcas.setDocumentText(documentText); SimplePipeline.runPipeline(jcas, builder.createAggregateDescription()); - Collection select = JCasUtil.select(jcas, IdentifiedAnnotation.class); + Collection select = JCasUtil.select(jcas, DeidEntity.class); Assert.assertEquals(documentText, split.length - 1, select.size()); int counter = 1; - for (IdentifiedAnnotation identifiedAnnotation : select) { - String actual = identifiedAnnotation.getCoveredText(); + for (DeidEntity each : select) { + String actual = each.getCoveredText(); String expected = split[counter]; Assert.assertEquals(expected, actual); counter++; Modified: ctakes/sandbox/ctakes-clinical-deid/src/test/resources/org/apache/ctakes/deid/examples.csv URL: http://svn.apache.org/viewvc/ctakes/sandbox/ctakes-clinical-deid/src/test/resources/org/apache/ctakes/deid/examples.csv?rev=1734445&r1=1734444&r2=1734445&view=diff ============================================================================== --- ctakes/sandbox/ctakes-clinical-deid/src/test/resources/org/apache/ctakes/deid/examples.csv (original) +++ ctakes/sandbox/ctakes-clinical-deid/src/test/resources/org/apache/ctakes/deid/examples.csv Thu Mar 10 18:52:48 2016 @@ -7,6 +7,6 @@ some text Mass 12345-1234 more text;Mass some text 742 Evergreen Terrace some text;742 Evergreen Terrace some text 742 Lower Evergreen Terrace some text;742 Lower Evergreen Terrace some text Evergreen street some text;Evergreen street -some text M.D. abc 123 some text;abc 123 +some text M.D. ab123 some text;ab123 some text [ab123] some text;ab123 some text on 123 some text; \ No newline at end of file