Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9203510324 for ; Thu, 17 Apr 2014 17:14:58 +0000 (UTC) Received: (qmail 14434 invoked by uid 500); 17 Apr 2014 17:14:57 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 14323 invoked by uid 500); 17 Apr 2014 17:14:57 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 14315 invoked by uid 99); 17 Apr 2014 17:14:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Apr 2014 17:14:57 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Dmitriy.Dligach@childrens.harvard.edu designates 134.174.13.92 as permitted sender) Received: from [134.174.13.92] (HELO mailsmtp2.childrenshospital.org) (134.174.13.92) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Apr 2014 17:14:52 +0000 Received: from pps.filterd (mailsmtp2.childrenshospital.org [127.0.0.1]) by mailsmtp2.childrenshospital.org (8.14.5/8.14.5) with SMTP id s3HHDOLl025131 for ; Thu, 17 Apr 2014 13:14:26 -0400 Received: from smtpndc2.chboston.org (smtpndc2.chboston.org [10.20.50.105]) by mailsmtp2.childrenshospital.org with ESMTP id 1kads1sjm9-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 17 Apr 2014 13:14:26 -0400 Received: from pps.filterd (smtpndc2.chboston.org [127.0.0.1]) by smtpndc2.chboston.org (8.14.5/8.14.5) with SMTP id s3HHEHEa005404 for ; Thu, 17 Apr 2014 13:14:26 -0400 Received: from chexhubcas1.chboston.org (internal-ndc-nat-v1260.tch.harvard.edu [10.20.50.4]) by smtpndc2.chboston.org with ESMTP id 1k4r4ynuyn-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Thu, 17 Apr 2014 13:14:26 -0400 Received: from CHEXMBX3A.CHBOSTON.ORG ([fe80::8df1:9966:b0b0:841d]) by CHEXHUBCAS1.CHBOSTON.ORG ([::1]) with mapi id 14.03.0169.001; Thu, 17 Apr 2014 13:14:25 -0400 From: "Dligach, Dmitriy" To: cTAKES Developer list Subject: Re: lvg entries Thread-Topic: lvg entries Thread-Index: Ac9aWeh+8tnv4IK+T8igRnGzLt2WCAAJ+04A Date: Thu, 17 Apr 2014 17:14:25 +0000 Message-ID: <48B49832-D03A-4B0B-B373-C2760B243EE5@childrens.harvard.edu> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.18.21.55] Content-Type: text/plain; charset="Windows-1252" Content-ID: <5538197BEC6F734B88F3DA1D964D3E64@childrens.harvard.edu> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.96,1.0.14,0.0.0000 definitions=2014-04-17_04:2014-04-17,2014-04-17,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.96,1.0.14,0.0.0000 definitions=2014-04-17_04:2014-04-17,2014-04-17,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1404170253 X-Virus-Checked: Checked by ClamAV on apache.org I don=92t know of any applications within cTAKES that make use of this=85 T= he reverse (mapping from these =93variants=94 to the normal form) may be us= eful though. Dima On Apr 17, 2014, at 11:50, Miller, Timothy wrote: > Sure, just as an example, I gave it a note with about 1000 words. It > generates 11500 NonEmptyFSList elements (each is basically one lexical > variant). >=20 > For the word "symptomatic", these are the first 10 of 20 lexical variants= : > Symptomaticer/JJ > Symptomaticer/RB > Symptomaticed/VB > Symptomaticcing/VB > Symptomatics/VB > Symptomatics/NN > Symptomaticked/VB > Symptomatic/VB > Symptomatic/JJ > Symptomatic/RB >=20 > Tim >=20 >=20 > On 04/17/2014 12:31 PM, Dligach, Dmitriy wrote: >> Tim, this is a very interesting observation. Could you please send a few= examples of what LVG generates? Both sensical and non :) >>=20 >> Dima >>=20 >>=20 >>=20 >>=20 >> On Apr 17, 2014, at 11:28, Miller, Timothy wrote: >>=20 >>> The LVG annotator creates an enormous number of "lemmas" for every >>> WordToken in the CAS, and I'm wondering what the original purpose was? = I >>> think this is probably a minor bottleneck for speed but mostly a pretty >>> big space hog (at least 50% of the space of xmi files in my tests). >>>=20 >>> As of right now I'm not sure if any downstream components are using >>> these lemmas, and on a manual inspection the precision seems to be >>> pretty abysmal (meaning most of them are nonsensical as lexical >>> variants), so as I said, just wondering if we can revisit why cTAKES >>> generates so many and whether that component can be optimized. >>>=20 >>> Thanks >>> Tim >>>=20 >>=20 >=20