ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pei Chen <chen...@apache.org>
Subject Re: move ytex annotators to ctakes.apache.org?
Date Mon, 21 Oct 2013 20:26:14 GMT
Hi Vijay,
This is awesome.  Some ideas inline below:

>I'm not sure how you collect all the dependencies for shipment, but how do
I tell maven not to include these?
Take a look at the distribution project [1].  It defines what gets put in
and out of the distro.

> Is it OK to check weka & jdbc into source control?
Please do not commit the non-compatible license jars.  We will have to
remove thembefore it gets distributed anyway so best to avoid it.  However,
if you would like to include it in the Jira as an attachment/Sandbox
initially to leverage the community's help, I can also take a look at it
and lend a helping hand if needed- and perhaps others in the community may
also be interested in helping out.

> * desc vs <project>-res
The -res projects was originally designed for the models/resources.  So
that downstream consumers do not necessary have to include huge resource
files if they only need the code.  So, I would suggest any plain text
config source files go directly into the project and it's corresponding
-res project.
[1]
https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-distribution/src/main/assembly/

> * distribution of umls concept graphs
Are the contents of those concept graphs ASL 2.0 compatible?  Probably will
need to double check to see if it's modified/considered derived works?

>* patches to other ctakes projects
I think these would be really good!  perhaps we can even open Jira's and
commit those in parallel..

>* post download setup
I think this is actually a good idea- to have some kind of "installer" that
guides the user through all the different download processes.  Would it be
possible to clone that to see how it would look like for ctakes?  I was
originally thinking of groovy or some other scripts, but would be curious
to see especially if ytex already did something like that.

>* +1 for the ytext projects for the time-being.  We can also refactor them
into the existing projects as appropriate (once everyone has a better
understanding of the functionality?)

Also, just curious- how big of a code base was this originally? I'm just
thinking about IP Clearance here (if it's required).


On Mon, Oct 21, 2013 at 8:57 AM, vijay garla <vngarla@gmail.com> wrote:

> Hello All,
>
> I've started on the ytex-ctakes port, and have some packaging questions.
>
> * Hibernate & Weka & JDBC Driver (SQL Server, Oracle) dependencies:
> I understand that we will not ship these jars as part of the ctakes
> download.  Can we bundle the jars and ship them as part of an additional
> download, available via sourceforge?  Hibernate is available via maven
> central, weka and jdbc not.  I have added weka & jdbc drivers as system
> dependencies.  I'm not sure how you collect all the dependencies for
> shipment, but how do I tell maven not to include these?  Is it OK to check
> weka & jdbc into source control?
>
> * desc vs <project>-res
> What are the guidelines for what goes where?  Configuration files are found
> in both places, whereas data/models are in the -res directory.  Ytex has
> many non-uima config files (hibernate, spring) which should be
> user-modifiable, and I would put them in the desc directory.  However, desc
> is not in the project classpath (but it is in the classpath for the ctakes
> distro, e.g. in runctakesCPE.bat).  Any reason for this dissonance?  I
> would add desc as a resources directory in the pom.
>
> * distribution of umls concept graphs
> for semantic similarity and word sense disambiguation, ytex provides
> concept graphs derived from the UMLS.  We have a download site that
> requires UTS login to get these concept graphs (
> http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip).  I take it I
> would just create a -res directory and add the concept graphs here, and
> they would automagically appear in the ctakes-resources zip?
>
> * patches to other ctakes projects
> ytex has some patches to other ctakes annotators for handling edge cases
> where they throw up with an exception; I will check to see if these changes
> have already been made.  If not, I will file separate Jira tickets for
> these patches.  Also, the CharacterOffsetToLineTokenConverterCtakesImpl
> needs to be modified to properly handle cases where newlines are in
> sentences; I will add a patch for that as well.
>
> * post download setup
> ytex provides an ant script to simplify the post download setup (database
> schema, setup, configuration file generation).  Would it be possible to
> ship ant with the ctakes distro, so that users can execute these scripts?
>  If not, how best to automate setup?  I know from experience with earlier
> versions of ytex that setting up the database schema is error prone, and
> that this needs to be automated.
>
>
> I was planning on creating the following projects:
> * ctakes-ytex:
> Base ytex, includes semantic similarity tools.  This has no dependencies on
> ctakes, and I would create a separate distribution of just this package for
> a semantic similarity distro.
> * ctakes-ytex-res
> Includes concept graphs for semantic similarity.
> * ctakes-ytex-web
> Provides User Interface, RESTful, and WebServices interface to semantic
> similarity service.  This has no dependencies on ctakes, and this would be
> included in the semantic similarity distro.
> * ctakes-ytex-uima
> Includes ytex analysis engines
> * ctakes-ytex-uima-res
> resources for ytex analysis engines
>
> Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res to
> existing projects (don't know where they would fit).
>
> Best,
>
> Vijay
>
>
>
>
> On Thu, Oct 3, 2013 at 7:06 PM, vijay garla <vngarla@gmail.com> wrote:
>
> > Hi Pei,
> >
> > The WSD annotator relies on the semantic similarity component, which
> > is a general purpose tool not strictly limited to ctakes or NLP.  I
> > would like to keep the semantic similarity component 'standalone',
> > i.e. with no dependencies on ctakes, and make it  redistributable on
> > its own.  If that is possible as part of ctakes, I'd love to move it.
> > If not, I'd leave the semantic similarity and the associated WSD
> > annotator on google code.
> >
> > For those of you who want the back story:
> > http://www.biomedcentral.com/1471-2105/13/261
> > http://jamia.bmj.com/content/20/5/882.long
> >
> >
> > -vj
> >
> > On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei
> > <Pei.Chen@childrens.harvard.edu> wrote:
> > > vj,
> > > Were you thinking of contributing the new ytext Word Sense
> > Disambiguation component as well- I think that will be really cool.
> > > --Pei
> > >
> > >> -----Original Message-----
> > >> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of Karthik
> > >> Sarma
> > >> Sent: Thursday, October 03, 2013 1:05 PM
> > >> To: dev@ctakes.apache.org
> > >> Subject: Re: move ytex annotators to ctakes.apache.org?
> > >>
> > >> This would be quite valuable -- in particular, ytex's annotation
> > database
> > >> connection is much easier to use than what ships with cTAKES. There
> are
> > a
> > >> fair number of other advantages, and I think they'd all be very
> > valuable!
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Karthik Sarma
> > >> UCLA Medical Scientist Training Program Class of 20??
> > >> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation
> > >> to the House of Delegates of the American Medical Association
> > >> ksarma@ksarma.com
> > >> gchat: ksarma@gmail.com
> > >> linkedin: www.linkedin.com/in/ksarma
> > >>
> > >>
> > >> On Thu, Oct 3, 2013 at 5:50 AM, vijay garla <vngarla@gmail.com>
> wrote:
> > >>
> > >> > Hello All,
> > >> >
> > >> > I'd like to contribute ytex to ctakes.  YTEX's main feature is the
> > >> > ability to store *any* ctakes (or uima) annotation in a relational
> > >> > database (in a relational format), and the ability to export these
> > >> > annotations to ML packages (weka, libsvm, matlab, R).  All of this
> is
> > >> > purely declarative/via configuration.
> > >> >
> > >> > In addtion, Ytex provides the following:
> > >> > * Negation Detection with Negex
> > >> > * SegmentRegexAnnotator - section detection with regular expressions
> > >> > * NamedEntityRegexAnnotator - named entity detection with regular
> > >> > expressions
> > >> > * Sentence Splitter - modified ctakes sentence splitter making
> > >> > sentence split patterns configurable (not hardcoded to \n)
> > >> >
> > >> > YTEX currently works with ctakes 2.5; I would like to upgrade it to
> > >> > the latest ctakes, and if the community is interested, contribute
to
> > >> > ctakes.apache.org.
> > >> >
> > >> > A licensing question: YTEX uses Spring (apache 2.0 license),
> Hibernate
> > >> > (lgpl 2.1), & weka (gpl).  Are there any issues with including
> these?
> > >> >
> > >> > Cheers
> > >> >
> > >> > vj
> > >> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message