ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: Help need to integrate UMLS2013AB dictionary
Date Mon, 19 May 2014 15:12:14 GMT
Just an FYI:
There are updated tools/scripts[1] that will format/load Sean’s new faster dictionary-lookup
[1] http://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool/


From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
Sent: Saturday, May 17, 2014 12:31 PM
To: 'user@ctakes.apache.org'
Subject: RE: Help need to integrate UMLS2013AB dictionary

The tokenizer that used a file of hyphenated words was replaced with a tokenizer that implements
the Penn Treebank tokenization rules (TokenizerPTB.java) a while back.
So as long as you used a recent copy of CreateLuceneIndexFromDelimitedFile which references
TokenizerPTB instead of just Tokenizer, you can ignore the part about a hypenated.txt file.
-- James

From: Ramprasad Reddy [mailto:ramprasadreddy.a@gmail.com]
Sent: Friday, May 16, 2014 4:20 PM
To: user@ctakes.apache.org
Subject: Help need to integrate UMLS2013AB dictionary

Good evening.

I have been to trying add latest UMLS2013AB data to resources similar to UMLS2011AB. I tried
to follow the instructions in the following locations:

  *   https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=423
  *   https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=80&start=20#p1459

I already extracted the data from UMLS website and created the pipe delimited text as well.

But looks like there is a change in the way tokenization(there is no hypenated.txt) is handled.

I am facing a no main class error while running CreateLuceneIndexFromDelimitedFile.java, and
also looking for help in creating steps to create 'umls.data' and 'umls.backup' files similar
to umls2011ab in HSQLDB

Sharing any resources or steps to do would be very helpful.

Thank you,

View raw message