ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vijay garla <vnga...@gmail.com>
Subject Re: move ytex annotators to ctakes.apache.org?
Date Mon, 21 Oct 2013 12:57:42 GMT
Hello All,

I've started on the ytex-ctakes port, and have some packaging questions.

* Hibernate & Weka & JDBC Driver (SQL Server, Oracle) dependencies:
I understand that we will not ship these jars as part of the ctakes
download.  Can we bundle the jars and ship them as part of an additional
download, available via sourceforge?  Hibernate is available via maven
central, weka and jdbc not.  I have added weka & jdbc drivers as system
dependencies.  I'm not sure how you collect all the dependencies for
shipment, but how do I tell maven not to include these?  Is it OK to check
weka & jdbc into source control?

* desc vs <project>-res
What are the guidelines for what goes where?  Configuration files are found
in both places, whereas data/models are in the -res directory.  Ytex has
many non-uima config files (hibernate, spring) which should be
user-modifiable, and I would put them in the desc directory.  However, desc
is not in the project classpath (but it is in the classpath for the ctakes
distro, e.g. in runctakesCPE.bat).  Any reason for this dissonance?  I
would add desc as a resources directory in the pom.

* distribution of umls concept graphs
for semantic similarity and word sense disambiguation, ytex provides
concept graphs derived from the UMLS.  We have a download site that
requires UTS login to get these concept graphs (
http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip).  I take it I
would just create a -res directory and add the concept graphs here, and
they would automagically appear in the ctakes-resources zip?

* patches to other ctakes projects
ytex has some patches to other ctakes annotators for handling edge cases
where they throw up with an exception; I will check to see if these changes
have already been made.  If not, I will file separate Jira tickets for
these patches.  Also, the CharacterOffsetToLineTokenConverterCtakesImpl
needs to be modified to properly handle cases where newlines are in
sentences; I will add a patch for that as well.

* post download setup
ytex provides an ant script to simplify the post download setup (database
schema, setup, configuration file generation).  Would it be possible to
ship ant with the ctakes distro, so that users can execute these scripts?
 If not, how best to automate setup?  I know from experience with earlier
versions of ytex that setting up the database schema is error prone, and
that this needs to be automated.


I was planning on creating the following projects:
* ctakes-ytex:
Base ytex, includes semantic similarity tools.  This has no dependencies on
ctakes, and I would create a separate distribution of just this package for
a semantic similarity distro.
* ctakes-ytex-res
Includes concept graphs for semantic similarity.
* ctakes-ytex-web
Provides User Interface, RESTful, and WebServices interface to semantic
similarity service.  This has no dependencies on ctakes, and this would be
included in the semantic similarity distro.
* ctakes-ytex-uima
Includes ytex analysis engines
* ctakes-ytex-uima-res
resources for ytex analysis engines

Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res to
existing projects (don't know where they would fit).

Best,

Vijay




On Thu, Oct 3, 2013 at 7:06 PM, vijay garla <vngarla@gmail.com> wrote:

> Hi Pei,
>
> The WSD annotator relies on the semantic similarity component, which
> is a general purpose tool not strictly limited to ctakes or NLP.  I
> would like to keep the semantic similarity component 'standalone',
> i.e. with no dependencies on ctakes, and make it  redistributable on
> its own.  If that is possible as part of ctakes, I'd love to move it.
> If not, I'd leave the semantic similarity and the associated WSD
> annotator on google code.
>
> For those of you who want the back story:
> http://www.biomedcentral.com/1471-2105/13/261
> http://jamia.bmj.com/content/20/5/882.long
>
>
> -vj
>
> On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei
> <Pei.Chen@childrens.harvard.edu> wrote:
> > vj,
> > Were you thinking of contributing the new ytext Word Sense
> Disambiguation component as well- I think that will be really cool.
> > --Pei
> >
> >> -----Original Message-----
> >> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of Karthik
> >> Sarma
> >> Sent: Thursday, October 03, 2013 1:05 PM
> >> To: dev@ctakes.apache.org
> >> Subject: Re: move ytex annotators to ctakes.apache.org?
> >>
> >> This would be quite valuable -- in particular, ytex's annotation
> database
> >> connection is much easier to use than what ships with cTAKES. There are
> a
> >> fair number of other advantages, and I think they'd all be very
> valuable!
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Karthik Sarma
> >> UCLA Medical Scientist Training Program Class of 20??
> >> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation
> >> to the House of Delegates of the American Medical Association
> >> ksarma@ksarma.com
> >> gchat: ksarma@gmail.com
> >> linkedin: www.linkedin.com/in/ksarma
> >>
> >>
> >> On Thu, Oct 3, 2013 at 5:50 AM, vijay garla <vngarla@gmail.com> wrote:
> >>
> >> > Hello All,
> >> >
> >> > I'd like to contribute ytex to ctakes.  YTEX's main feature is the
> >> > ability to store *any* ctakes (or uima) annotation in a relational
> >> > database (in a relational format), and the ability to export these
> >> > annotations to ML packages (weka, libsvm, matlab, R).  All of this is
> >> > purely declarative/via configuration.
> >> >
> >> > In addtion, Ytex provides the following:
> >> > * Negation Detection with Negex
> >> > * SegmentRegexAnnotator - section detection with regular expressions
> >> > * NamedEntityRegexAnnotator - named entity detection with regular
> >> > expressions
> >> > * Sentence Splitter - modified ctakes sentence splitter making
> >> > sentence split patterns configurable (not hardcoded to \n)
> >> >
> >> > YTEX currently works with ctakes 2.5; I would like to upgrade it to
> >> > the latest ctakes, and if the community is interested, contribute to
> >> > ctakes.apache.org.
> >> >
> >> > A licensing question: YTEX uses Spring (apache 2.0 license), Hibernate
> >> > (lgpl 2.1), & weka (gpl).  Are there any issues with including these?
> >> >
> >> > Cheers
> >> >
> >> > vj
> >> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message