ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: How to predict for sequence of words
Date Fri, 22 May 2015 13:53:04 GMT
We use sequence classifiers in the temporal project to extract temporal expressions. One way
to do it is called BIO tagging, where each element in the sequence is classified as Begin,
Inside, or Outside of some span by a standard classifier like SVM. Another way is to use an
explicit sequence model like HMM or CRF (also using BIO labels but finding a globally optimal
tagging). We use the ClearTK library for its feature extraction and interfaces with machine
learning libraries. There are examples of both kinds of model in ctakes-temporal.

TimeAnnotator uses an SVM BIO tagger:
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/TimeAnnotator.java

CrfTimeAnnotator uses a CRF BIO tagger:
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/CRFTimeAnnotator.java

This is very high level and there are a lot of details to do this right, foremost is the importance
of gold standard labeled training data. The classes above are trained on THYME data.

Tim

On 05/22/2015 02:11 AM, Soumya Shree wrote:
Hi folks,

I am new to Ctakes & NLP concept . I need to train my application in a manner that I should
be able to predict for sequence of words. Do we have any API which helps to do that or any
concept with which we can leverage the same. Also I need to create a train bin file so I need
to know the structure for the training text so that I can validate it and convert it in bin
file successfully.

Thanks & Regards,
Soumya Shree
[cid:image001.png@01D09484.122FC9A0]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.citiustech.com_&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=oSuaCBl8b2QOPxweDxwxhGb6J-_g8vKbStY0y6Ilaig&e=>[cid:image002.png@01D09484.122FC9A0]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_company_80661-3Ftrk-3Dtyah&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=M-US_dQ_gSy2vYjSiKNDiC_d1oki4uzu9B1HNEhTmGI&e=>
 [cid:image003.png@01D09484.122FC9A0] <https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_-23-21_CitiusTech&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=eWT3h8Lz8FLBm7R2K5EVTjzYdGyg1J8f0iYDvSYwH44&e=>
  [cid:image004.png@01D09484.122FC9A0] <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_pages_CitiusTech_124740167627560-3Fsk-3Dwall&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=jC3dacfTgaS1SlkOo6_FuOOvilxb24wCIFXNGdbEWpg&e=>

===========================================================================================================================================================================================
DISCLAIMER: The information contained in this message (including any attachments) is confidential
and may be privileged. If you have received it by mistake please notify the sender by return
e-mail and permanently delete this message and any attachments from your system. Any dissemination,
use, review, distribution, printing or copying of this message in whole or in part is strictly
prohibited. Please note that e-mails are susceptible to change. CitiusTech shall not be liable
for the improper or incomplete transmission of the information contained in this communication
nor for any delay in its receipt or damage to your system. CitiusTech does not guarantee that
the integrity of this communication has been maintained or that this communication is free
of viruses, interceptions or interferences. ====================================================================================================================================================================


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message