Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 03181187D0 for ; Sun, 21 Jun 2015 18:46:44 +0000 (UTC) Received: (qmail 30879 invoked by uid 500); 21 Jun 2015 18:46:43 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 30821 invoked by uid 500); 21 Jun 2015 18:46:43 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 30810 invoked by uid 99); 21 Jun 2015 18:46:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Jun 2015 18:46:43 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [134.174.13.92] (HELO mailsmtp2.childrenshospital.org) (134.174.13.92) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Jun 2015 18:44:29 +0000 Received: from pps.filterd (mailsmtp2.childrenshospital.org [127.0.0.1]) by mailsmtp2.childrenshospital.org (8.15.0.59/8.15.0.59) with SMTP id t5LIdRh6019390 for ; Sun, 21 Jun 2015 14:45:56 -0400 Received: from smtpbdc1.chboston.org (smtpbdc1.chboston.org [10.20.18.104]) by mailsmtp2.childrenshospital.org with ESMTP id 1v54sxxakp-1 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sun, 21 Jun 2015 14:45:55 -0400 Received: from pps.filterd (smtpbdc1.chboston.org [127.0.0.1]) by smtpbdc1.chboston.org (8.15.0.59/8.15.0.59) with SMTP id t5LIfdfS031062 for ; Sun, 21 Jun 2015 14:45:55 -0400 Received: from chexhubcasbdc1.chboston.org (internal-bdc-nat-v2260.tch.harvard.edu [10.20.18.4]) by smtpbdc1.chboston.org with ESMTP id 1v51x5j0sy-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Sun, 21 Jun 2015 14:45:55 -0400 Received: from CHEXMBX3C.CHBOSTON.ORG ([fe80::6d14:6390:7d91:be47]) by CHEXHUBCASBDC1.CHBOSTON.ORG ([::1]) with mapi id 14.03.0224.002; Sun, 21 Jun 2015 14:45:54 -0400 From: "Miller, Timothy" To: "dev@ctakes.apache.org" Subject: RE: The fast dictionary pipeline vs. the regular one Thread-Topic: The fast dictionary pipeline vs. the regular one Thread-Index: AdCr/V7abTGHYLgwS3iAlT2ofSLVRAAVMRiu Date: Sun, 21 Jun 2015 18:45:54 +0000 Message-ID: References: <798E66FF532D5A44AE06D6D87F64AD94BD7594A3@RAVEN.algotec.co.il> In-Reply-To: <798E66FF532D5A44AE06D6D87F64AD94BD7594A3@RAVEN.algotec.co.il> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.20.50.129] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-06-21_01:,, signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-06-21_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=0 compositescore=0.998080788252169 phishscore=0 kscore.is_spamscore=0 rbsscore=0.998080788252169 recipient_to_sender_totalscore=0 spamscore=0 urlsuspectscore=0.998080788252169 adultscore=0 kscore.compositescore=0 circleOfTrustscore=0 malwarescore=0 suspectscore=0 recipient_domain_to_sender_totalscore=0 bulkscore=0 recipient_domain_to_sender_domain_totalscore=0 recipient_to_sender_domain_totalscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1502090000 definitions=main-1506210343 X-Virus-Checked: Checked by ClamAV on apache.org Sean wrote the fast version and may be able to answer your specific questio= ns. But in general, the fast dictionary does not match performance exactly = -- it is not implementing an equivalent search and it has different indexin= g methods. We are happy to receive reports of what seem like bugs, though, = any new software is likely to have some. What I will say is that I know Sea= n has run some (as yet unpublished) experiments and we believe that in the = aggregate the new system output is at least as high quality as the older on= e.=0A= Tim=0A= =0A= =0A= ________________________________________=0A= From: Oranit Dror [oranit@algotec.co.il]=0A= Sent: Sunday, June 21, 2015 4:37 AM=0A= To: dev@ctakes.apache.org=0A= Subject: The fast dictionary pipeline vs. the regular one=0A= =0A= Hello,=0A= =0A= I am using ctakes 3.2.2 with the regular pipeline. Recently, I have tested = the fast dictionary pipeline and indeed it is much faster.=0A= However, I have encountered with several quality differences in the returne= d annotations. For example:=0A= =0A= =0A= 1. With the fast pipeline, the term "GBM" is annotated as "glioblasto= ma multiforme", while in the regular pipeline it is annotated as "glioblast= oma".=0A= Note that according to the UMLS DB, the concept of "GBM" is "glioblastoma" = and "glioblastoma multiforme" is mapped to a narrower concept.=0A= =0A= =0A= 2. The word "cm" in a phrase like "5.5 cm X 2.6 cm" is annotated by t= he regular pipeline as "Cutaneous Mastocytosis", while in the fast pipeline= it is not annotated as a medical term (as expected and as in UMLS).=0A= =0A= =0A= Any explanation for the differences?=0A= =0A= Thank you,=0A= Oranit.=0A= =0A= =0A= =0A=