Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 79330200B3C for ; Wed, 13 Jul 2016 21:00:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 77B75160A6E; Wed, 13 Jul 2016 19:00:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9A62E160A62 for ; Wed, 13 Jul 2016 21:00:24 +0200 (CEST) Received: (qmail 35557 invoked by uid 500); 13 Jul 2016 19:00:23 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 35546 invoked by uid 99); 13 Jul 2016 19:00:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jul 2016 19:00:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1A8611804B4 for ; Wed, 13 Jul 2016 19:00:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.301 X-Spam-Level: X-Spam-Status: No, score=-2.301 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 4PHGsiaELonw for ; Wed, 13 Jul 2016 19:00:21 +0000 (UTC) Received: from mailsmtp1.childrenshospital.org (mailsmtp1.childrenshospital.org [134.174.13.91]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 8605D5F46E for ; Wed, 13 Jul 2016 19:00:20 +0000 (UTC) Received: from pps.filterd (mailsmtp1.childrenshospital.org [127.0.0.1]) by mailsmtp1.childrenshospital.org (8.16.0.11/8.16.0.11) with SMTP id u6DItEbP020172 for ; Wed, 13 Jul 2016 15:00:12 -0400 Received: from smtpndc1.chboston.org (smtpndc1.chboston.org [10.20.50.104]) by mailsmtp1.childrenshospital.org with ESMTP id 245r5xh87b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 13 Jul 2016 15:00:08 -0400 Received: from pps.filterd (smtpndc1.chboston.org [127.0.0.1]) by smtpndc1.chboston.org (8.16.0.11/8.16.0.11) with SMTP id u6DIukXo018884 for ; Wed, 13 Jul 2016 15:00:03 -0400 Received: from chexlag1.chboston.org (chexlag1.chboston.org [10.40.131.127]) by smtpndc1.chboston.org with ESMTP id 244psnyf3p-1 for ; Wed, 13 Jul 2016 15:00:03 -0400 Received: from CHEXMAIL1B.CHBOSTON.ORG (10.40.131.132) by CHEXLAG1.CHBOSTON.ORG (10.40.131.127) with Microsoft SMTP Server (TLS) id 15.0.1130.7; Wed, 13 Jul 2016 15:00:02 -0400 Received: from CHEXMAIL1B.CHBOSTON.ORG ([fe80::943d:6382:51d9:3d74]) by CHEXMAIL1B.CHBOSTON.ORG ([fe80::943d:6382:51d9:3d74%21]) with mapi id 15.00.1130.005; Wed, 13 Jul 2016 15:00:02 -0400 From: "Finan, Sean" To: "dev@ctakes.apache.org" Subject: RE: Help needed with document creation time/date Thread-Topic: Help needed with document creation time/date Thread-Index: AQHR3TT65I170pMbG0GVDCg3eVr8s6AWsAQw//+OCoCAAHaEIA== Date: Wed, 13 Jul 2016 19:00:02 +0000 Message-ID: <9401e4caa0e24258a2190c0fe483bae4@CHEXMAIL1B.CHBOSTON.ORG> References: <781a7b80a4df4ce394e70cc1bdecea3d@CHEXMAIL1B.CHBOSTON.ORG> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [172.18.21.55] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-13_09:,, signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-13_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607130206 archived-at: Wed, 13 Jul 2016 19:00:25 -0000 Basically, you just want to create a TimeMention. Here is a short example: final String docText =3D jcas.getDocumentText(); final Matcher dateMatcher =3D DATE_PATTERN.matcher( docText ); if ( dateMatcher.matches() ) { final TimeMention docTime =3D new TimeMention( jcas ); docTime.setBegin( dateMatcher.start( 1 ) ); docTime.setEnd( dateMatcher.end( 2 ) ); docTime.setId( 0 ); docTime.addToIndexes(); } If you do want to use the org.cleartk.timeml.type.DocumentCreationTime clas= s then you can do so. For later fetching and use, with a TimeMention you'l= l rely on the class type and id while on the DocumentCreationTime you can j= ust use the class type. =20 Sean -----Original Message----- From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]=20 Sent: Wednesday, July 13, 2016 2:47 PM To: dev@ctakes.apache.org Subject: Re: Help needed with document creation time/date Thanks Sean. Great advice. I have a regexNER, but didn't go that route because it looked as if there w= as an inbuilt mechanism waiting to be activated. Say I know the time from some external source, is there a kosher way I can = inject it into the CAS as a creation time property so that it can be retrie= ved later by a client that knows only the serialized CAS? Peter On 7/13/16, 11:41 AM, "Finan, Sean" wrote: >Hi Peter, > >Our group has used two different approaches, depending upon the note type: >1. Use a custom AE that creates creation time based upon a regex. =20 >This works well for notes that have a header or footer with a known format= . >2. Use the last normalized temporal expression. For my test notes=20 >this worked more frequently than you would think (~90%), but I would=20 >not go this route unless you have thoroughly thought about what is in=20 >your notes and how you are going to use the document creation time. > >That is all that we've done with respect to getting the creation time=20 >from the actual text. If you have any kind of structured data tied to=20 >the note that indicates date, then you can tie things (e.g. doctimerel, >doctime) together post-process. We are doing this in one project. > >Sean > >-----Original Message----- >From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com] >Sent: Wednesday, July 13, 2016 2:33 PM >To: dev@ctakes.apache.org >Subject: Help needed with document creation time/date > >Hello All > >How can I get Ctakes to deduce the document creation datetime from the=20 >text. I have a pipeline including the following engines Basic Token=20 >Processing FastUMLS > >Zoner > >ClearNLPDependencyParserAE > >PolarityCleartkAnalysisEngine > >UncertaintyCleartkAnalysisEngine > >HistoryCleartkAnalysisEngine > >ConditionalCleartkAnalysisEngine > >GenericCleartkAnalysisEngine > >SubjectCleartkAnalysisEngine > >EventAnnotator > >AnalysisEngineFactory.createEngineDescription(CopyPropertiesToTemporalE >ven >tAnnotator.class) > >DocTimeRelAnnotator > >BackwardsTimeAnnotator > >EventTimeRelationAnnotator > >EventEventRelationAnnotator > > >I see that there is a DocumentCreationTime type, but it seems to be=20 >initialized from inside one of the ClearTKAnnotators. > >I cannot find any documentation and don't know if it is looking for=20 >particular manifestations in the text or whether a property needs to be=20 >set externally on the JCAS or one of the SOFAs. > > >Any help out there? Examples? > > >Many thanks, > >Peter