Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 33E3610B80 for ; Thu, 6 Feb 2014 18:05:37 +0000 (UTC) Received: (qmail 27478 invoked by uid 500); 6 Feb 2014 18:05:34 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 27420 invoked by uid 500); 6 Feb 2014 18:05:34 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 27409 invoked by uid 99); 6 Feb 2014 18:05:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Feb 2014 18:05:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vngarla@gmail.com designates 74.125.82.45 as permitted sender) Received: from [74.125.82.45] (HELO mail-wg0-f45.google.com) (74.125.82.45) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Feb 2014 18:05:29 +0000 Received: by mail-wg0-f45.google.com with SMTP id n12so1518548wgh.0 for ; Thu, 06 Feb 2014 10:05:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ph7BcfbcFaSS5YWqtFj159NGWbHnaxRD48YdyrMCkVM=; b=bARzwovvaWj7pIVS6zCkcCu7Jogc+lNYzeZxRMphWs2ORn2S1pw3AMswzqCc1xidE+ hpIcw47eLMPXVhx4aQZW3M9Vfe5AGQ++f1x3zyb9PlhJq4GLdKXcSZculnJys5mLOEAG nwQm9B17XMo/X6VXbuoN3YM9z4jSRAIsiw5C4UjtaRZGG/BY3481O1yedGc4i/Lt2/E7 iGzBuQK5YUGaP4UlD8+Bd4nZmfkhIACUjgb7lQfAk1O87QbEgd2J4F2KTcneFcvCn7Hb wall5OHmNFDYlHLbz7bDu3SQnpqAfTZv4mVrxwFQ8g2DaCjNqaTeWC0/V3keffbp6GGt 3zUA== MIME-Version: 1.0 X-Received: by 10.180.73.141 with SMTP id l13mr457734wiv.60.1391709908199; Thu, 06 Feb 2014 10:05:08 -0800 (PST) Received: by 10.227.7.132 with HTTP; Thu, 6 Feb 2014 10:05:08 -0800 (PST) In-Reply-To: <393252F14C42F946952F1ED75D316CAD38661A9F@CHEXMBX2A.CHBOSTON.ORG> References: <70f03a80-ce1a-4c0e-b35d-5116d1c93ea0@googlegroups.com> <924DE05C19409B438EB81DE683A942D910667176@CHEXMBX1A.CHBOSTON.ORG> <393252F14C42F946952F1ED75D316CAD38661A9F@CHEXMBX2A.CHBOSTON.ORG> Date: Thu, 6 Feb 2014 13:05:08 -0500 Message-ID: Subject: Re: YTEX cTAKES 3.1.1 ready From: vijay garla To: "dev@ctakes.apache.org" Cc: "ytex-users@googlegroups.com" , "ctakes-dev@incubator.apache.org" , "vlad.valtchinov@gmail.com" Content-Type: multipart/alternative; boundary=f46d043c814cb5747504f1c0b619 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c814cb5747504f1c0b619 Content-Type: text/plain; charset=ISO-8859-1 The cTAKES sentence detector is not changed in the YTEX branch. The YTEX branch has an *additional* sentence detector that does not automatically split sentences on newlines - users can use this if they like. -vj On Thu, Feb 6, 2014 at 1:01 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote: > Hi Vijay, > > > I have yet to run across clinical text from a real EMR where newlines > represent the end of a sentence > > Since James pointed out this possibility a couple weeks ago, I have kept > my eyes open. The problem is pretty ubiquitous in a corpus that I'm > working with right now. I just opened the first note and gave it a count > ... 95 lines total, 9 are sentence/phrase (lacking punctuation) endings. > This is not including lists, which comprise about half of the note. > One possible conjoinment was "Will consider [...] biopsy\nGiven [...]". > Depending upon how cTakes deals with it, the meaning could change > drastically. > > > I believe cTAKES absolutely has to support sentences with newlines > within them > > Yes, cTakes should do so, but I hope that you aren't suggesting that it > only support such a structure. > > Where is that easy button? > > -----Original Message----- > From: vijay garla [mailto:vngarla@gmail.com] > Sent: Thursday, February 06, 2014 10:31 AM > To: dev@ctakes.apache.org > Cc: ytex-users@googlegroups.com; ctakes-dev@incubator.apache.org; > vlad.valtchinov@gmail.com > Subject: Re: YTEX cTAKES 3.1.1 ready > > I believe it is worth migrating to trunk. > > Note that the sentence detector is also complementary - the existing > ctakes sentence detector is unchanged - users can choose which sentence > detector to use. There are changes to assertion & dependency parsing to > support sentences without newlines, and that works with both sentence > detectors. > > I believe cTAKES absolutely has to support sentences with newlines within > them - I have yet to run across clinical text from a real EMR where > newlines represent the end of a sentence - the changes to assertion & > dependency parsing will have to be done at some point. > > -vj > > > On Thu, Feb 6, 2014 at 10:19 AM, Chen, Pei > wrote: > > > VJ, > > Aside from the changes to the existing cTAKES code (sentence detector, > > etc.) [which we could leave out if it's still being debated], Do you > > think it's worth migrating the ytex code to trunk at this point? > > As you mentioned earlier, it's largely complementary. > > [I was just thinking of saving effort to maintain the separate branch > > and for simplicity for dev...] > > > > --Pei > > > > > -----Original Message----- > > > From: vijay garla [mailto:vngarla@gmail.com] > > > Sent: Wednesday, February 05, 2014 9:30 PM > > > To: ytex-users@googlegroups.com; ctakes-dev@incubator.apache.org; > > > vlad.valtchinov@gmail.com > > > Subject: Re: YTEX cTAKES 3.1.1 ready > > > > > > Hi Vlad, > > > > > > I Updated the umls install guide; see > > > https://code.google.com/p/ytex/wiki/UMLS_SQL_SERVER_3_1 > > > > > > I would prefer to add the docs in the ctakes confluence, but as far > > > as I > > can > > > tell, I don't have write access there - can somebody give me write > > privileges > > > on the ctakes confluence site? > > > > > > There was a bug in the umls install; copy > > > https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes- > > > ytex/scripts/data/build.xmlover > > > the corresponding file in your ctakes-3.1.2 install > > > (CTAKES_HOME\bin\ctakes-ytex\scripts\data) and you should be set. > > > The import is currently running on the UMLS 2013AA (I assume this > > > will > > complete > > > without issues as long as the umls schema hasn't changed from 2012). > > > > > > what trial and error did you have to go through to build the distro? > > > > > > -vj > > > > > > > > > On Wed, Feb 5, 2014 at 5:33 PM, vijay garla wrote: > > > > > > > Hi Vlad, > > > > > > > > sorry that the instructions aren't clear. > > > > > > > > re 1) What I am trying to say is install > > > > apache-ctakes-3.2.0-snapshot as usual (this is unchanged from > > > > 3.1.1). After that you still have to apply the lib and resources > > > > (these are things that cannot be distributed via apache). > > > > > > > > re 2) Yes, I need to update those docs. Hopefully will get to > > > > that at some point. However, I assume you already have a UMLS DB > > > > (also assume SQL Server). If you can't/don't want to use your > > > > existing umls DB, please tell me. The I'll priortize upgrading > > > > the doc on importing the umls tables (the scripts are there). > > > > > > > > best, > > > > > > > > VJ > > > > > > > > > > > > On Wed, Feb 5, 2014 at 4:44 PM, wrote: > > > > > > > >> Hi VJ- > > > >> > > > >> so, with trial and error were able to make the distribution and > > > >> now have the apache-ctakes-3.1.2-SNAPSHOT-bin.zip archive. > > > >> > > > >> Here's what's unclear. > > > >> > > > >> 1. Is now this the only (combined) thing that you need for ctakes > > > >> 3.1.1 + Ytex? > > > >> the current documentation (https://code.google.com/p/yte > > > >> x/wiki/Installation_cTAKES_3_1?ts=1388793998&updated=Instal > > > >> lation_cTAKES_3_1) > > > >> which most probably is outdated, talks about installing cTakes > > > >> 3.1.1 first and then applying 2 SNAPSHOT archives (downloadable) > > > >> , lib and resources. > > > >> This is a confusion point. > > > >> > > > >> 2. The directions to import UMLS subset are then outdated as well. > > > >> Maybe one should use the old version (ctakes 2.5 and ytex 0.8) to > > > >> import the RRF files for the UMLS subset and then just use the > > > >> resulting db. Thoughts? > > > >> > > > >> Thanks, > > > >> Vlad Valtchinov > > > >> Brigham Rad > > > >> > > > >> > > > >> On Thursday, January 30, 2014 5:17:43 PM UTC-5, vijay garla wrote: > > > >> > > > >>> Hi Vlad, > > > >>> > > > >>> > > > >> All of ytex has been moved into ctakes, it is currently in a > > > >> branch ( > > > >>> https://svn.apache.org/repos/asf/ctakes/branches/ytex). You > > > >>> don't have to install ytex-0.8 - instead you will have to build > > > >>> and install from the ytex branch to create your own > > > >>> distribution. Steps > > 2 & 3 > > > are correct. > > > >>> > > > >>> Although it is a pain, if you have the jdk, maven, and svn, you > > > >>> can easily build your own distro: > > > >>> * open a command prompt > > > >>> * make sure jdk, maven, and svn are in your path > > > >>> * cd to some directory where you want to check stuff out (I like > > > >>> c:\temp) > > > >>> * run the following commands > > > >>> rmdir /s /q ctakes > > > >>> svn co https://svn.apache.org/repos/asf/ctakes/branches/ytex > > > >>> ctakes cd ctakes mvn clean install -DskipTests > > > >>> > > > >>> And you will have the ctakes (with ytex) distro in > > > >>> ctakes\ctakes-distribution\target\apache-ctakes-3.1.2-SNAPSHOT-b > > > >>> in.z > > > >>> ip > > > >>> > > > >>> What is the process for getting the ytex branch merged into trunk? > > > >>> As I mentioned, there are very few changes to other ctakes > > > >>> classes/types - this should be completely complementary and not > > > >>> affect any existing ctakes functionality. > > > >>> > > > >>> -vj > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Thu, Jan 30, 2014 at 4:56 PM, wrote: > > > >>> > > > >>>> Hi VJ-- > > > >>>> > > > >>>> this is great!! Thanks for all the hard work on it! > > > >>>> > > > >>>> We're starting to look into the new install. For now we're > > > >>>> trying the binaries out. > > > >>>> > > > >>>> There were these questions about the proper install steps: > > > >>>> > > > >>>> 1. Do we first install ytex-0.8 2. Then install the new cTakes > > > >>>> 3.1.1 instance and also apply the SNAPSHOT lib and resources > > > >>>> zips 3. Work our way to install the UMLS ontologies in the db > > > >>>> > > > >>>> Its is not entirely clear from the new document ( > > > >>>> https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_ > > > >>>> 1?ts=1388793998&updated=Installation_cTAKES_3_1) > > > >>>> if there's still need to install ytex-0.8, or YTEX has been > > > >>>> entirely merged into cTakes? > > > >>>> > > > >>>> If the last statement is correct, there are missing parts in > > > >>>> i.e the UMLS install steps that are linked from the new ctakes > > > >>>> 3.1.1 > > > document. > > > >>>> > > > >>>> Thanks, > > > >>>> vlad > > > >>>> > > > >>>> > > > >>>> On Friday, January 3, 2014 10:21:52 PM UTC-5, vijay garla wrote: > > > >>>>> > > > >>>>> Hello All, > > > >>>>> > > > >>>>> I have finished an initial cut at the port of YTEX to cTAKES > 3.1.1. > > > >>>>> Most of the YTEX functionality has been ported and integrated > > > >>>>> with cTAKES, and I've tested with MySQL and MS SQL Server > > > >>>>> (oracle > > > tests pending). > > > >>>>> > > > >>>>> Most of the changes were made in new projects - very little > > > >>>>> existing cTAKES code has been modified. The only non-trivial > > > >>>>> changes are in > > > >>>>> /ctakes- > > > assertion/src/main/java/org/apache/ctakes/assertion/medfac > > > >>>>> ts/i2b2/api > > > >>>>> - here I modified > > > >>>>> CharacterOffsetToLineTokenConverterCtakesImpl & > > > >>>>> SingleDocumentProcessorCtakes to deal with newlines within > > > >>>>> sentences correctly. Can somebody take a look at the changes > > > >>>>> in > > the > > > ytex branch? > > > >>>>> > > > >>>>> I believe that the branch https://svn.apache.org/ > > > >>>>> repos/asf/ctakes/branches/ytex is ready to be merged into > > > >>>>> ctakes trunk, but would like other users to test it as well. > Questions: > > > >>>>> > > > >>>>> * How can I distribute the ctakes binary distribution to ytex > > > >>>>> users before the merge? Can we make the branch build available > > > >>>>> somewhere? The binary distribution is too large to host on > > > >>>>> the ytex google code site (max > > > >>>>> 200 MB) > > > >>>>> * Non-ASF libraries - I have segregated these out into their > > > >>>>> own zip file that can be distributed via sourceforge. As a > > > >>>>> stopgap, I can upload this to the ytex google code site, but > > > >>>>> would prefer to upload to sourceforge. > > > >>>>> * UMLS Derivatives - Ditto for these - would like to move to > > > >>>>> sourceforge. > > > >>>>> * Documentation - How can I update the confluence docs? I > > > >>>>> would migrate the documentation from the google code website. > > > >>>>> > > > >>>>> Here the installation instructions (putting the wagon in front > > > >>>>> of the horse ...) > > > >>>>> > > > >>>>> https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_1? > > > >>>>> ts=1388793998&updated=Installation_cTAKES_3_1 > > > >>>>> > > > >>>>> Best, > > > >>>>> > > > >>>>> VJ > > > >>>>> > > > >>>>> > > > >>>>> -- > > > >>>> You received this message because you are subscribed to the > > > >>>> Google Groups "ytex-users" group. > > > >>>> To unsubscribe from this group and stop receiving emails from > > > >>>> it, send an email to ytex-users+...@googlegroups.com. > > > >>>> To post to this group, send email to ytex-...@googlegroups.com. > > > >>>> To view this discussion on the web visit > > > >>>> https://groups.google.com/d/ > > > >>>> msgid/ytex-users/70f03a80-ce1a-4c0e-b35d-5116d1c93ea0% > > > >>>> 40googlegroups.com. > > > >>>> > > > >>>> For more options, visit https://groups.google.com/groups/opt_out. > > > >>>> > > > >>> > > > >>> -- > > > >> You received this message because you are subscribed to the > > > >> Google Groups "ytex-users" group. > > > >> To unsubscribe from this group and stop receiving emails from it, > > > >> send an email to ytex-users+unsubscribe@googlegroups.com. > > > >> To post to this group, send email to ytex-users@googlegroups.com. > > > >> To view this discussion on the web visit > > > >> https://groups.google.com/d/msgid/ytex-users/bc3bd705-55d2-4acd- > > > a273- > > > >> a3b1a7b36241%40googlegroups.com > > > >> . > > > >> > > > >> For more options, visit https://groups.google.com/groups/opt_out. > > > >> > > > > > > > > > > > --f46d043c814cb5747504f1c0b619--