ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vijay garla <vnga...@gmail.com>
Subject Re: move ytex annotators to ctakes.apache.org?
Date Mon, 21 Oct 2013 23:02:36 GMT
More questions:


On Mon, Oct 21, 2013 at 4:26 PM, Pei Chen <chenpei@apache.org> wrote:

> Hi Vijay,
> This is awesome.  Some ideas inline below:
>
> >I'm not sure how you collect all the dependencies for shipment, but how do
> I tell maven not to include these?
> Take a look at the distribution project [1].  It defines what gets put in
> and out of the distro.
>
> > Is it OK to check weka & jdbc into source control?
> Please do not commit the non-compatible license jars.  We will have to
> remove thembefore it gets distributed anyway so best to avoid it.  However,
> if you would like to include it in the Jira as an attachment/Sandbox
> initially to leverage the community's help, I can also take a look at it
> and lend a helping hand if needed- and perhaps others in the community may
> also be interested in helping out.
>

These libraries can be included via maven (not checked in) and excluded
when creating the distro - that get's around having non-compatible jars in
source control/distro.  The main question is, how to ship these?  As part
of the resources jar?


>
> > * desc vs <project>-res
> The -res projects was originally designed for the models/resources.  So
> that downstream consumers do not necessary have to include huge resource
> files if they only need the code.  So, I would suggest any plain text
> config source files go directly into the project and it's corresponding
> -res project.
> [1]
>
> https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-distribution/src/main/assembly/
>
> > * distribution of umls concept graphs
> Are the contents of those concept graphs ASL 2.0 compatible?  Probably will
> need to double check to see if it's modified/considered derived works?
>

These should be handled identical to the UMLS hsql dictionary shipped in
resources - the concept graphs are derived from UMLS level 0 sources +
SNOMED CT


>
> >* patches to other ctakes projects
> I think these would be really good!  perhaps we can even open Jira's and
> commit those in parallel..
>
> >* post download setup
> I think this is actually a good idea- to have some kind of "installer" that
> guides the user through all the different download processes.  Would it be
> possible to clone that to see how it would look like for ctakes?  I was
> originally thinking of groovy or some other scripts, but would be curious
> to see especially if ytex already did something like that.
>

I am just using plain vanilla ant, but am open to doing this 'the' ctakes
way, but it doesn't sound like there is an established mechanism.  I am
open to using any scripting language that is already included in ctakes.


>
> >* +1 for the ytext projects for the time-being.  We can also refactor them
> into the existing projects as appropriate (once everyone has a better
> understanding of the functionality?)
>
> Also, just curious- how big of a code base was this originally? I'm just
> thinking about IP Clearance here (if it's required).
>

~250 java source files, ~1mb source (didn't do a line count)  the original
code is ASF 2.0 license.


>
>
> On Mon, Oct 21, 2013 at 8:57 AM, vijay garla <vngarla@gmail.com> wrote:
>
> > Hello All,
> >
> > I've started on the ytex-ctakes port, and have some packaging questions.
> >
> > * Hibernate & Weka & JDBC Driver (SQL Server, Oracle) dependencies:
> > I understand that we will not ship these jars as part of the ctakes
> > download.  Can we bundle the jars and ship them as part of an additional
> > download, available via sourceforge?  Hibernate is available via maven
> > central, weka and jdbc not.  I have added weka & jdbc drivers as system
> > dependencies.  I'm not sure how you collect all the dependencies for
> > shipment, but how do I tell maven not to include these?  Is it OK to
> check
> > weka & jdbc into source control?
> >
> > * desc vs <project>-res
> > What are the guidelines for what goes where?  Configuration files are
> found
> > in both places, whereas data/models are in the -res directory.  Ytex has
> > many non-uima config files (hibernate, spring) which should be
> > user-modifiable, and I would put them in the desc directory.  However,
> desc
> > is not in the project classpath (but it is in the classpath for the
> ctakes
> > distro, e.g. in runctakesCPE.bat).  Any reason for this dissonance?  I
> > would add desc as a resources directory in the pom.
> >
> > * distribution of umls concept graphs
> > for semantic similarity and word sense disambiguation, ytex provides
> > concept graphs derived from the UMLS.  We have a download site that
> > requires UTS login to get these concept graphs (
> > http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip).  I take it I
> > would just create a -res directory and add the concept graphs here, and
> > they would automagically appear in the ctakes-resources zip?
> >
> > * patches to other ctakes projects
> > ytex has some patches to other ctakes annotators for handling edge cases
> > where they throw up with an exception; I will check to see if these
> changes
> > have already been made.  If not, I will file separate Jira tickets for
> > these patches.  Also, the CharacterOffsetToLineTokenConverterCtakesImpl
> > needs to be modified to properly handle cases where newlines are in
> > sentences; I will add a patch for that as well.
> >
> > * post download setup
> > ytex provides an ant script to simplify the post download setup (database
> > schema, setup, configuration file generation).  Would it be possible to
> > ship ant with the ctakes distro, so that users can execute these scripts?
> >  If not, how best to automate setup?  I know from experience with earlier
> > versions of ytex that setting up the database schema is error prone, and
> > that this needs to be automated.
> >
> >
> > I was planning on creating the following projects:
> > * ctakes-ytex:
> > Base ytex, includes semantic similarity tools.  This has no dependencies
> on
> > ctakes, and I would create a separate distribution of just this package
> for
> > a semantic similarity distro.
> > * ctakes-ytex-res
> > Includes concept graphs for semantic similarity.
> > * ctakes-ytex-web
> > Provides User Interface, RESTful, and WebServices interface to semantic
> > similarity service.  This has no dependencies on ctakes, and this would
> be
> > included in the semantic similarity distro.
> > * ctakes-ytex-uima
> > Includes ytex analysis engines
> > * ctakes-ytex-uima-res
> > resources for ytex analysis engines
> >
> > Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res to
> > existing projects (don't know where they would fit).
> >
> > Best,
> >
> > Vijay
> >
> >
> >
> >
> > On Thu, Oct 3, 2013 at 7:06 PM, vijay garla <vngarla@gmail.com> wrote:
> >
> > > Hi Pei,
> > >
> > > The WSD annotator relies on the semantic similarity component, which
> > > is a general purpose tool not strictly limited to ctakes or NLP.  I
> > > would like to keep the semantic similarity component 'standalone',
> > > i.e. with no dependencies on ctakes, and make it  redistributable on
> > > its own.  If that is possible as part of ctakes, I'd love to move it.
> > > If not, I'd leave the semantic similarity and the associated WSD
> > > annotator on google code.
> > >
> > > For those of you who want the back story:
> > > http://www.biomedcentral.com/1471-2105/13/261
> > > http://jamia.bmj.com/content/20/5/882.long
> > >
> > >
> > > -vj
> > >
> > > On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei
> > > <Pei.Chen@childrens.harvard.edu> wrote:
> > > > vj,
> > > > Were you thinking of contributing the new ytext Word Sense
> > > Disambiguation component as well- I think that will be really cool.
> > > > --Pei
> > > >
> > > >> -----Original Message-----
> > > >> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of
> Karthik
> > > >> Sarma
> > > >> Sent: Thursday, October 03, 2013 1:05 PM
> > > >> To: dev@ctakes.apache.org
> > > >> Subject: Re: move ytex annotators to ctakes.apache.org?
> > > >>
> > > >> This would be quite valuable -- in particular, ytex's annotation
> > > database
> > > >> connection is much easier to use than what ships with cTAKES. There
> > are
> > > a
> > > >> fair number of other advantages, and I think they'd all be very
> > > valuable!
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Karthik Sarma
> > > >> UCLA Medical Scientist Training Program Class of 20??
> > > >> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation
> > > >> to the House of Delegates of the American Medical Association
> > > >> ksarma@ksarma.com
> > > >> gchat: ksarma@gmail.com
> > > >> linkedin: www.linkedin.com/in/ksarma
> > > >>
> > > >>
> > > >> On Thu, Oct 3, 2013 at 5:50 AM, vijay garla <vngarla@gmail.com>
> > wrote:
> > > >>
> > > >> > Hello All,
> > > >> >
> > > >> > I'd like to contribute ytex to ctakes.  YTEX's main feature is
the
> > > >> > ability to store *any* ctakes (or uima) annotation in a relational
> > > >> > database (in a relational format), and the ability to export
these
> > > >> > annotations to ML packages (weka, libsvm, matlab, R).  All of
this
> > is
> > > >> > purely declarative/via configuration.
> > > >> >
> > > >> > In addtion, Ytex provides the following:
> > > >> > * Negation Detection with Negex
> > > >> > * SegmentRegexAnnotator - section detection with regular
> expressions
> > > >> > * NamedEntityRegexAnnotator - named entity detection with regular
> > > >> > expressions
> > > >> > * Sentence Splitter - modified ctakes sentence splitter making
> > > >> > sentence split patterns configurable (not hardcoded to \n)
> > > >> >
> > > >> > YTEX currently works with ctakes 2.5; I would like to upgrade
it
> to
> > > >> > the latest ctakes, and if the community is interested, contribute
> to
> > > >> > ctakes.apache.org.
> > > >> >
> > > >> > A licensing question: YTEX uses Spring (apache 2.0 license),
> > Hibernate
> > > >> > (lgpl 2.1), & weka (gpl).  Are there any issues with including
> > these?
> > > >> >
> > > >> > Cheers
> > > >> >
> > > >> > vj
> > > >> >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message