ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pei Chen <chen...@apache.org>
Subject Re: Creating Runnable .JARs From A Subset of cTAKES Maven Modules
Date Tue, 01 Oct 2013 18:48:47 GMT
Rob,
Are you pulling the existing ctakes dependencies from maven central.  Or
did you have recreate ctakes modules in a local repo of some sort?
It would be good to make ctakes flexible enough to do what you described
(hence seperating out modules and resources into it's own modules).
--Pei


On Tue, Oct 1, 2013 at 2:06 PM, Robert Spurrier <
robert.spurrier@explorys.com> wrote:

> It's been a while, but just to update in case anyone is watching this:
>
> My goal was to create a project full of annotators (both cTAKES and
> home-grown), and "cherry-pick" from them at will to create smaller
> pipelines that could be launched on a hadoop grid via MapReduce.
>
> My final setup consisted of two Maven aggregator projects, Annotators and
> Pipelines.
>
> Annotators is an aggregator project containing all of the annotators and
> their resources.  I am essentially following the cTAKES layout for this
> one. One annotator, one module.
> E.g.:
> Annotators
>         -ctakes-core-annotator
>                 Pom.xml
>         -ctakes-pos-tagger-annotator
>                 Pom.xml
>         -custom-annotator-one
>                 Pom.xml
> ParentPom.xml
>
>
> Pipelines is another aggregator project containing the source code to
> generate the pipelines, and the job files that utilize the pipelines on
> the hadoop grid (effectively serving as the input reader & CAS consumer).
> Each pipeline is its own Maven module, and spits outs a .jar that contains
> all of the classes I need to run a UIMA-MapReduce job for that specific
> pipeline. It also creates a resource archive (model files, etc) that I
> ship off to the Hadoop DistributedCache.
> E.g.:
> Pipelines
>         -custom-base-pipeline
>                 Pom.xml
>         -observation-pipeline
>                 Pom.xml
> ParentPom.xml
>
>
>
> Notes:
> -I modified the cTAKES pom to put all of the descriptors into each
> individual annotator jar as well as the classes, just so they can
> conveniently be called by name.The "heavier" resources are put on the
> DistributedCache.
>
> -I create individual pipeline distributions in the Pipelines project by
> using Maven Reactor Plugin at the parent project level. E.g. "maven
> package -pl custom-base-pipeline  -am" . This builds custom-base-pipeline
> with all of its dependencies, and all of the necessary resource
>
> -Each pipeline has it's own Maven assembly to specify what should be
> included with that pipeline's distribution and resources
>
>
> The point of this was to maximize modularity, pipeline flexibility,
> runtime speed, and to keep my pipeline jars as lightweight as possible.
> Though it has many awesome features, I did not want to run every part of
> cTAKES every time.
>
>
> Cheers,
> Rob
>
>
>
>
>
>
>
>
>
>
> On 9/9/13 11:23 AM, "Robert Spurrier" <robert.spurrier@explorys.com>
> wrote:
>
> >Actually after poking around in Maven documentation I think I have just
> >figured out an approach I like.
> >
> >For each pipeline I wish to create, I will generate a Maven assembly
> >descriptor. I will put each assembly file in the cTAKES root pom.xml.
> >Hopefully this will create each pipeline for me when I run 'package'. This
> >approach will still tie in nicely with the project object model/lifecycle
> >of cTAKES, and generate all my custom jars as well.
> >
> >I will try it out and update this thread with the results
> >
> >Thanks,
> >Rob
> >
> >
> >On 9/9/13 10:38 AM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu> wrote:
> >
> >>Hi Robert,
> >>
> >>Are you planning to a process to build everything from source?
> >>Or were you planning to have a build process that combines the ctakes-***
> >>jars with your custom application jars?
> >>
> >>--Pei
> >>
> >>> -----Original Message-----
> >>> From: Robert Spurrier [mailto:robert.spurrier@explorys.com]
> >>> Sent: Monday, September 09, 2013 9:27 AM
> >>> To: dev@ctakes.apache.org
> >>> Subject: Creating Runnable .JARs From A Subset of cTAKES Maven Modules
> >>>
> >>> Good Morning!
> >>>
> >>> I am trying to use cTAKES tools on a distributed computing platform. I
> >>>would
> >>> rather not ship the entire compiled cTAKES package (~1.5 Gb) out to the
> >>> shared cache when I only need a few annotators and their resources at a
> >>> time.
> >>>
> >>> I should first mention that I am not very familiar with Maven. I
> >>>recently
> >>> upgraded cTAKES from v 2.5.0, where I was configuring smaller pipelines
> >>> using ant build files. This process was cumbersome however, and I can
> >>> appreciate the new modular Maven project layout.  I just do not know
> >>>how
> >>> to effectively utilize it in a way that is flexible.
> >>>
> >>> Does anyone have any advice on how I can package subsets of cTAKES
> >>> annotator modules and their dependencies/resources, so  I can create
> >>> 'thinner' custom pipelines that are geared towards specific tasks?
> >>>
> >>> For example, I might ultimately want a pipeline .JAR that contains the
> >>>tools to
> >>> RegEx Left Ventricular Ejection Fraction measurements from free text.
> >>>In
> >>> such a .JAR I would not need any of the dictionary resources or
> >>>negation
> >>> annotators, so they could be excluded.
> >>>
> >>> It looks like I could create Maven assembly plugin descriptors to
> >>>generate
> >>> these custom .JARs, but I would like to see if anyone here has any
> >>> advice/caveats before I pursue this route.
> >>>
> >>>
> >>> Thanks,
> >>> Robert Spurrier
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message