Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3616A10D59 for ; Wed, 2 Oct 2013 19:31:07 +0000 (UTC) Received: (qmail 33499 invoked by uid 500); 2 Oct 2013 19:31:06 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 33381 invoked by uid 500); 2 Oct 2013 19:31:02 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 33372 invoked by uid 99); 2 Oct 2013 19:31:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Oct 2013 19:31:00 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.139.213.144] (HELO nm29-vm1.bullet.mail.bf1.yahoo.com) (98.139.213.144) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Oct 2013 19:30:55 +0000 Received: from [98.139.215.140] by nm29.bullet.mail.bf1.yahoo.com with NNFMP; 02 Oct 2013 19:30:33 -0000 Received: from [98.139.212.246] by tm11.bullet.mail.bf1.yahoo.com with NNFMP; 02 Oct 2013 19:30:33 -0000 Received: from [127.0.0.1] by omp1055.mail.bf1.yahoo.com with NNFMP; 02 Oct 2013 19:30:33 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 672799.9169.bm@omp1055.mail.bf1.yahoo.com Received: (qmail 53904 invoked by uid 60001); 2 Oct 2013 19:30:33 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1380742233; bh=j1bu3K5SxRzwGfGSXwtvevrAjxp+h8PsNeR2O4kOhJc=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=5GyGkd43XqWkiSv++OlAg/wPMaIf3SiPOP45WlF/T8ntBhQmATT1mOMG7Phy9PFM+mdFC/5XRCzjWnnYZKlqjE6uhzaprNf7gmykcdIXDI958bfE5n11TsOkMQv2OZOChaZyjrDgLN+2niLOq6RUe2kCCG1NYJ/lAmWJYSBC7MA= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=bPzuaDOVZt4PiTiM4TH6ZbV48l3jI5s/mDQzVoMcE12NVa4ogpf9rz4OTSPVoQXkMx7P/QiEgNkdWzCwlyXpngw3ZDchlB7ShGQDC0jI4XRtkwcvMJXRuPSQna7/w8a8nmOidXxBjEj8/yz/6EnHY9HZHQxz4uI0oTggAivBrmM=; X-YMail-OSG: SGp1TFcVM1mUBOtLZsBcW_udeSeRjn93.vpj6bpojtDDK.i TZrOg5V2Qr6Y3sfjqCYPJQwkKjJpkEvGbPhykvGifXd5HmmJap4dD27O9WtI ADDx.R0Qqy_zQPvKxj6m63fl4nML.jJI__dgTMVmNDnf84jKyG3IAGI2Ui_R .uMCQrXY3V1MjcWnJnun3f3_nzl_RTs0Q0iRChb.J1csJdqp.LQp7399kl4B iyQ61r8Vk1JTYMeqOuOuiIruIRadol17RGa_ZOkmrJA2cFjNjP4jrz6oxIZt aUD5B1QalPo9li5Dv9UBzRPoZ.EH8CQ7yl0Qx4DYvF34JzHUfjvZl8TePsRa tlPyMqOjt7CROqF_U3.bD.fk9NfCtNPGQRrg0g4hotukComHq4jhEcZ_ma7S oHh4xPhWo5znMZTTP2GIXQvuCw6bg_Ncz6u_H7DgE1YRyGW_hdgKniEMDHCj NQIi4FfRmudo9Nntuvq8YtYgD3ncVTNycgJN5Nc_ZnBM11kgW6_Ll3ui5pOk qPE_nVdDP1Kii1YcsB.oPED1NfoXIk5wo0kHK6qOislTdjN5Izl0G14xzuIC H5H2xFlt4UF3h.SnwIoPiSmnnjH3cyZRLzx8.yD1pJdlXDuVvbHO2UIgAqnS gA77uxAU1yqfNLHOEWyt0UhLybuHnW4avtGZqfK2hevkpsMJfaXQblb11ao_ ggTM0NbNYjbIJXLiJI46K1PfT5hBY5kT66ZV5DSFFy0VnIUW3lgX2W0sZok2 3aA-- Received: from [132.213.8.3] by web140301.mail.bf1.yahoo.com via HTTP; Wed, 02 Oct 2013 12:30:32 PDT X-Rocket-MIMEInfo: 002.001,SnVzdCBhIHRob3VnaHQgLi4uCnByb2JhYmx5IGhhdmUgc29tZSBraW5kIG9mIGludGVyZmFjZSAobWV0aG9kKSB0aGF0IGFsbG93IHRvIGdldCB0aGUgcnVuIHRpbWUgYXR0cmlidXRlcyBvZiBhIGNlcnRhaW4gdHlwZXN5c3RlbS4gVGhlc2UgbWV0aG9kIHdpbGwgaGlkZSBzb21lIGNvZGUgY29tcGxleGl0eS4gRm9yIGV4YW1wbGUgQmFzZVRva2VuLmdldFN1bnRlbmNlTnVtYmVyICgpIHdpbGwgcmV0dXJuIHRoZSBzZW50ZW5jZSBudW1iZXIgYmFzZWQgb24gdGhlIHNlbGVjdENvdmVyaW5nIG9yIGluZGV4Q28BMAEBAQE- X-Mailer: YahooMailWebService/0.8.160.587 References: <3656FD2E-AD51-4DCF-AC3F-B40E7255F616@apache.org> Message-ID: <1380742232.50581.YahooMailNeo@web140301.mail.bf1.yahoo.com> Date: Wed, 2 Oct 2013 12:30:32 -0700 (PDT) From: samir chabou Reply-To: samir chabou Subject: Re: Common Type System across systems? To: "dev@ctakes.apache.org" In-Reply-To: <3656FD2E-AD51-4DCF-AC3F-B40E7255F616@apache.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1950762199-29906039-1380742232=:50581" X-Virus-Checked: Checked by ClamAV on apache.org --1950762199-29906039-1380742232=:50581 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Just a thought ...=0Aprobably have some kind of interface (method) that all= ow to get the run time attributes of a certain typesystem. These method wil= l hide some code complexity. For example BaseToken.getSuntenceNumber () wil= l return the sentence number based on the selectCovering or indexCovering o= r isCovered =0AThanks=0ASamir=0A=0A=0A=0A=0A_______________________________= _=0A From: Richard Eckart de Castilho =0ATo: dev@ctakes.apa= che.org =0ASent: Wednesday, October 2, 2013 12:07:36 PM=0ASubject: Re: Comm= on Type System across systems?=0A =0A=0AThanks for the reference, I'll have= a look at it.=0A=0AI don't plan to invent the ultimate type system :P Of c= ourse that would be=0Adoomed to fail. I also don't plan to venture into the= design of the special=0Amedical types that cTAKES needs in addition.=0A=0A= I plan to make suggestions for the basic analysis levels (e.g. sentence,=0A= token) and possibly work up from there into some of the lower linguistic = =0Aanalysis levels, as well as to suggest general design patterns. There ar= e=0Aalso some ideas how to handle adoption to reduce changes to code to a m= inimum.=0A=0AI think there is some realistic potential. But let's see how f= ar this can be=0Apushed=E2=80=A6 if anywhere at all :) Maybe I'm too optimi= stic :P=0A=0A-- Richard=0A=0AOn 02.10.2013, at 17:53, "Wu, Stephen T., Ph.D= ." wrote:=0A=0A> Richard, it'd be great if you are ab= le to put dedicated effort to it,=0A> i.e., take the lead for (1) below!=0A= > =0A> Unfortunately, in our experience, you still need a lot of people and= their=0A> time to be involved in (2), which often requires funding, and as= mentioned=0A> in (2a) if it is not binding then people will be unlikely to= adopt.=C2=A0 Maybe=0A> I'm overly pessimistic?=0A> =0A> One specific porti= on of the cTAKES type system is that we make separate=0A> types for the cli= nical semantic groups.=C2=A0 The referential semantics portion=0A> of the t= ype system was the main focus of our efforts (see reference below)=0A> due = to its importance in the medical domain.=C2=A0 This is quite different than= =0A> semantic structures, e.g., Discourse Representation Theory.=C2=A0 Rich= ard, I'm=0A> interested in how you'd view the differences as someone who wa= sn't=0A> involved in their creation.=0A> =0A> I think we made plenty of mis= takes that make life difficult for people at=0A> a practical level, since w= e were designing it not necessarily even tied to=0A> UIMA.=C2=A0 But hopefu= lly with your additional work it will be really good!=0A> =0A> Anyways good= luck! =3DP=0A> =0A> stephen=0A> =0A> * Wu, Stephen T, Vinod C Kaggal, Dmit= riy Dligach, James J Masanz, Pei=0A> Chen, Lee Becker, Wendy W Chapman, Gue= rgana K Savova, Hongfang Liu,=0A> Christopher G Chute. A common type system= for clinical natural language=0A> processing . J Biomed Sem. 4:1.=0A> 2013.=0A> =0A> On 10/1/13 2:53 PM, "Ka= rthik Sarma" wrote:=0A> =0A>> This seems like a *very* = challenging and involved problem to me...=0A>> =0A>> On Tuesday, October 1,= 2013, Pei Chen wrote:=0A>> =0A>>> Agreed.=0A>>> Yes, I think this is sligh= t augmentation and extension of the original=0A>>> vision of the clinical c= ommon type system- by having it work with other=0A>>> UIMA based NLP system= .=C2=A0 Having worked on item (3) for cTAKES, I actually=0A>>> think the to= ugh part will be getting consensus and agreement on a system=0A>>> between = all parties and less on the required code changes.=C2=A0 Hence, just=0A>>> = wanted to ping the community to gauge interest and see if this actually=0A>= >> makes sense [It would be nice to plug in different POSTaggers or example= =0A>>> without having to remap types].=0A>>> If we have a willing volunteer= (Richard :)?) to perform some of the=0A>>> prelim=0A>>> analysis Q1 2014 w= ith our existing type system, perhaps we can actually=0A>>> make this happe= n.=0A>>> =0A>>> 4a) I think the SHARP4 development group has essentially mo= ved to the=0A>>> cTAKES ASF community which is probably even better since i= t already has=0A>>> a=0A>>> meritocratic/governance mechanism to handle cha= nges.=0A>>> =0A>>> =0A>>> =0A>>> On Tue, Oct 1, 2013 at 10:39 AM, Wu, Steph= en T., Ph.D.=0A>>> >wrote:=0A>>> =0A>>>>= Pei et al,=0A>>>> That was the vision for the SHARP "common type system", = except it was=0A>>>> meant to include medical-related projects rather than = general=0A>>> projects.=0A>>>> =0A>>>> Steve's process below is probably th= e most realistic way to do things,=0A>>> and=0A>>>> it's basically how we d= id the current cTAKES type system.=0A>>> Unfortunately,=0A>>>> the "someone= " doing #1 was me, and I didn't realize that it would be=0A>>> quite=0A>>>>= difficult.=C2=A0 I guess I know more about how to do it now but #1 and #2= =0A>>> were=0A>>>> surprisingly harder than I expected.=C2=A0 I'm adding a = #4:=0A>>>> =0A>>>> (1) Have someone inspect the various type systems closel= y and make a=0A>>>> proposal=0A>>>>=C2=A0 A. Know each of the type systems = on their own.=C2=A0 Essential to=0A>>> visualize=0A>>>> them appropriately,= but it is still difficult to understand the=0A>>>> implications of type ch= anges just by looking. (By the way, we never=0A>>> came=0A>>>> up with a re= ally great automatic visualization tool, closest was a=0A>>> Prot=C3=A9g=C3= =A9=0A>>>> plugin. Excellent visualization would go a long way, especially = if=0A>>> edits=0A>>>> were possible.)=0A>>>>=C2=A0 B. Categorize portions o= f type systems to compare and take them a=0A>>> step=0A>>>> at a time.=0A>>= >>=C2=A0 C. Clearly limit which type systems you are going to consider for= =0A>>> your=0A>>>> comparison and reconciliation.=0A>>>>=C2=A0 D. Pick a st= arting point.=C2=A0 I found it nearly impossible to create=0A>>> from=0A>>>= > scratch when you're staring at 4-5 other type systems.=C2=A0 We started= =0A>>> from=0A>>>> the old cTAKES type system but that did cause some bias!= =0A>>>>=C2=A0 E. Develop real criteria (or at least opinions) for choosing = between=0A>>> the=0A>>>> many options.=0A>>>> =0A>>>> (2) Agree on the prop= osal.=0A>>>>=C2=A0 A. Multiple projects should make a binding agreement to = implement.=0A>>> This=0A>>>> means, most likely, that they somebody needs t= o have assurance of=0A>>> funding.=0A>>>> In our case, we only made it bind= ing for cTAKES, so it is only used=0A>>> by=0A>>>> cTAKES (as far as I know= ).=0A>>>>=C2=A0 B. With different projects' vested interests on the line, h= ave some=0A>>> real=0A>>>> discussions of what your project is going to giv= e up with the proposed=0A>>>> stuff.=0A>>>> =0A>>>> (3) Spend the time to r= e-write all the code to use the new type=0A>>> system.=0A>>>>=C2=A0 * As St= eve said, this is time-consuming, especially if things get=0A>>> broken=0A>= >>> and models need to be retrained, etc.=0A>>>> =0A>>>> (4) Ensure mainten= ance and modifiability across projects.=0A>>>>=C2=A0 A. The original SHARP = common type system vision handed off the=0A>>>> maintenance to the Software= Development Group, but that never really=0A>>>> happened. I hope the Apach= e community can serve as this to some=0A>>> degree,=0A>>>> but so far it ha= s still depended on unreliable people like myself.=0A>>>>=C2=A0 B. A means = of having everyone automatically draw from the same=0A>>> source=0A>>>> cod= e would be preferable.=0A>>>>=C2=A0 C. If, in the future, you need to consi= der another UIMA project=0A>>> whose=0A>>>> type system should be reconcile= d... Well, that's happening right now.=0A>>> I=0A>>>> guess you can worry a= bout it when you get there if you have a=0A>>> community=0A>>>> that's will= ing to deal with it.=0A>>>> =0A>>>> =0A>>>> Those are just some thoughts.= =C2=A0 It's not impossible, but neither is it=0A>>>> simple.=0A>>>> =0A>>>>= stephen=0A>>>> =0A>>>> =0A>>>> =0A>>>> =0A>>>> On 9/30/13 8:17 PM, "Steven= Bethard" wrote:=0A>>>> =0A>>>>> We (ClearTK) ta= lked with Richard (DKPro) about doing this for ClearTK=0A>>>>> and DKPro. B= asically, both groups were all for it, but the main issue=0A>>>>> was time.= Basically you need to:=0A>>>>> =0A>>>>> (1) Have someone inspect the vario= us type systems closely and make a=0A>>>>> proposal=0A>>>>> (2) Agree on th= e proposal.=0A>>>>> (3) Spend the time to re-write all the code to use the = new type=0A>>> system.=0A>>>>> =0A>>>>> Step (3) is especially time consumi= ng, but in fact, we never managed=0A>>>>> to get the free time for step (1)= .=0A>>>>> =0A>>>>> That all said, ClearTK would love to share a common type= system with=0A>>>>> other projects.=0A>>>>> =0A>>>>> Steve=0A>>>>> =0A>>>>= > =0A>>>>> On Mon, Sep 30, 2013 at 7:38 PM, Pei Chen w= rote:=0A>>>>>> Richard, I, and few others had an interesting bar conversati= on...=0A>>>>>> In the spirit of interoperability, What if we had a baseline= common=0A>>> type=0A>>>>>> system that could be reused across UIMA compati= ble NLP systems?=0A>>>>>> Imagine for a moment that OpenNLP, Clea=0A>> =0A>= > -- =0A>> --=0A>> Karthik Sarma=0A>> UCLA Medical Scientist Training Progr= am Class of 20??=0A>> Member, UCLA Medical Imaging & Informatics Lab=0A>> M= ember, CA Delegation to the House of Delegates of the American Medical=0A>>= Association=0A>> ksarma@ksarma.com=0A>> gchat: ksarma@gmail.com=0A>> linke= din: www.linkedin.com/in/ksarma=0A> --1950762199-29906039-1380742232=:50581--