Return-Path: X-Original-To: apmail-ctakes-notifications-archive@www.apache.org Delivered-To: apmail-ctakes-notifications-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4540A18A7E for ; Thu, 9 Jul 2015 19:24:10 +0000 (UTC) Received: (qmail 54344 invoked by uid 500); 9 Jul 2015 19:24:05 -0000 Delivered-To: apmail-ctakes-notifications-archive@ctakes.apache.org Received: (qmail 54298 invoked by uid 500); 9 Jul 2015 19:24:05 -0000 Mailing-List: contact notifications-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list notifications@ctakes.apache.org Received: (qmail 54193 invoked by uid 99); 9 Jul 2015 19:24:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jul 2015 19:24:05 +0000 Date: Thu, 9 Jul 2015 19:24:05 +0000 (UTC) From: "britt fitch (JIRA)" To: notifications@ctakes.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CTAKES-368) Allow alternate CUI formats in fast dictionary lookup module MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 britt fitch created CTAKES-368: ---------------------------------- Summary: Allow alternate CUI formats in fast dictionary lookup= module Key: CTAKES-368 URL: https://issues.apache.org/jira/browse/CTAKES-368 Project: cTAKES Issue Type: Improvement Components: ctakes-dictionary-lookup Affects Versions: 3.2.2 Reporter: britt fitch Assignee: Sean Finan Fix For: 3.2.3 The current fast lookup using a BSV parses the first field as =E2=80=9CC=E2= =80=9D and up to 7 numerals, padding with =E2=80=9C0" as needed to reach th= at length when applicable [see CuiCodeUtil.getCuiCode(String)] The CUI string is then substring=E2=80=99d from 1 to len and parsed as a Lo= ng. This is producing issues with other related, but separate, ontologies (MedG= en) where the bulk of concepts use UMLS CUIs but some additional concepts w= ere created by the NCBI where no CUI previously existed. These MedGen-specific concepts are created with a prefix =E2=80=9CCN=E2=80= =9D + 6 numerals, resulting in =E2=80=9CN123456=E2=80=9D failing to produce= a Long. It is preferred to allow alternative CUI formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)