Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE6F7173F1 for ; Fri, 22 May 2015 14:56:57 +0000 (UTC) Received: (qmail 82663 invoked by uid 500); 22 May 2015 14:56:57 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 82609 invoked by uid 500); 22 May 2015 14:56:57 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 82597 invoked by uid 99); 22 May 2015 14:56:57 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2015 14:56:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2D2D5C68EF for ; Fri, 22 May 2015 14:56:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id xkMJjq4YaHWh for ; Fri, 22 May 2015 14:56:48 +0000 (UTC) Received: from mail-lb0-f175.google.com (mail-lb0-f175.google.com [209.85.217.175]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id D821720BA8 for ; Fri, 22 May 2015 14:56:47 +0000 (UTC) Received: by lbbuc2 with SMTP id uc2so14754034lbb.2 for ; Fri, 22 May 2015 07:56:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=CQS9yuM34RN6tSDiiSXjhR20w1msCTU1I56TyTS4Y8Q=; b=aZ+o10IY5/tUgqxhcv9KFdCuvXuhzS+VXJBM/3OA8oq4ljTSIjb6mGrUA+Vst7b90M e93SEr9MfjLx2WsbpTm0xGI2O0tkYdhqbZ+7eKs2s+ByiK1duvSPrQDyGo062UfkUGg0 I+s44ZhS6hZMwQ3spq7l3nsTbO0yVzcUC9vfNDlTJ4r6iAR2QQe/c2uKssgLCjMPnJ8r Lktu8n9OZPKyF0du+NYPJOdx7LyF3DUGIHDAeyl4JVmnXhSjOX9AlU1Ih1ImQf/OG6D3 ERW4cirFK+BP1Nd/YcED93jHRwYqEIxfwDM43ISxQ9NvvTjQeJm+kG+UtXbzS6NE7lOj uAGA== X-Received: by 10.112.13.98 with SMTP id g2mr2443387lbc.102.1432306607324; Fri, 22 May 2015 07:56:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.65.104 with HTTP; Fri, 22 May 2015 07:56:26 -0700 (PDT) In-Reply-To: References: From: Damir Olejar Date: Fri, 22 May 2015 10:56:26 -0400 Message-ID: Subject: Re: Exploiting the power of cTakes, using OpenNLP only To: dev@ctakes.apache.org Content-Type: multipart/alternative; boundary=001a11c3b8a68a03030516acdeb8 --001a11c3b8a68a03030516acdeb8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thank you Guergana, this does help a lot, and it will be a valuable information for the people I work for, while integrating cTakes. Damir On Fri, May 22, 2015 at 8:04 AM, Savova, Guergana < Guergana.Savova@childrens.harvard.edu> wrote: > Yes, you are correct. cTAKES does named entity recognition and > normalization=3Dmapping to an ontology (through the UMLS). The normalizat= ion > part is what is different from what is usually done in the general domain > (where mentions of several semantic types are discovered but not > necessarily normalized to a concept within an ontology). In the general > domain, there is a recent trend to normalize to Wikipedia (wikification). > > In short, to do the NER in cTAKES you do need a license for the UMLS. BTW= , > that license is free for level 0 vocabularies. > > Hope this information helps. > --Guergana > > -----Original Message----- > From: Damir Olejar [mailto:olejar.damir@gmail.com] > Sent: Friday, May 22, 2015 7:51 AM > To: dev@ctakes.apache.org > Subject: Re: Exploiting the power of cTakes, using OpenNLP only > > To answer my own question, it all comes down to UMLS licensing, and which > files are being downloaded from the server. > The files that are downloaded are compressed *.model files that can be > integrated with cTakes. > However, there is (or might be in the near future) a restriction to which > user can download which files, and also, there might be a copyright issue > if the UMLS procedure is not followed. > > So, yes, there is no need for UIMA, but then, for any serious work, the > copyrights need to be respected. > > > On Thu, May 21, 2015 at 12:10 PM, Damir Olejar > wrote: > > > To whom it may concern, > > > > First, I would like to apologize if my question is vague, since I am > > new and unaccustomed to the cTakes diction. To keep my question simple > > and up to a point, let us assume that I am working only with an Apache > > OpenNLP. I do not have any UIMA-specific JAR files included, and let > > us assume that I do not want to include any of them (or keep it to a > > minimum), thus keeping the project confined to OpenNLP as much as > possible. > > > > As far as I know, UIMA is just a framework that does not provide any > > specific NLP tools (source: > > > https://urldefense.proofpoint.com/v2/url?u=3Dhttp-3A__stackoverflow.com_q= uestions_24186742_is-2Duima-2Dprovides-2Donly-2Da-2Dwrapper-2Dor-2Dis-2Dit-= 2Dlike-2Dstandfordcore-2Dnlp-2Dand-2Dgate&d=3DBQIFaQ&c=3DqS4goWBT7poplM69zy= _3xhKwEW14JZMSdioCoppxeFU&r=3DSeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfE= crO4yRGmRCJNAr-rCmP&m=3DumFvmAvfVN2FIHuugFp5H33UdNyy-mxG3U3yDPRMp9I&s=3DuM0= wOUdg63NBJRXD3JRZeU0fx-jT8ide6bcZdx_-WY8&e=3D > ). > > This means that there should be a way of integrating the cTakes > > components with OpenNLP. > > > > What I would like to do is to simply have the Name Entity Recognition > > (NER) applied to a text, so I know which word from an inputted > > sentence is a medical term. The perfect option would be if I could > > have a *.bin file such as "en-ner-person.bin=E2=80=9D, but I think that= cTakes > > does not give us such an option, since there are no *.bin files. > > > > How would I accomplish such a task? Would there be any code, examples, > > tutorials, documentations, pseudo-code, ideas ,=E2=80=A6 to take a look= at? > > > > Thank you kindly for your time, understanding, and a patience. > > > > Damir > > > --001a11c3b8a68a03030516acdeb8--