incubator-ctakes-dev mailing list archives

From vijay garla <>
Subject Re: SVN source structure for Apache cTAKES?
Date Wed, 18 Jul 2012 23:07:54 GMT

To start with a disclaimer: I'm not a cTAKES committer but I do use (and
modify) cTAKES quite a bit.  So, my $0.02:

> +1 for multiple jars.
I've found it to be very problematic that there is a single cTAKES.jar with
all code and dependencies kludged in - this makes it very hard to use later
versions of a library cTAKES is using, even if it is backwards compatible.

Case in point: LVG.  The HSqlDB jars from LVG 2008 are in the cTAKES jar,
and this version of HSqlDB is not compatible with LVG 2011.  I could put
the LVG 2011 libraries in front of cTAKES.jar in the classpath, but that is
messy and might cause other problems

> -1 for separate top-level src/   test/   example/   and   resource/
The scr/main/java and src/main/cpp structure is the default maven
structure, and I believe apache is using maven for build.  Yes you can
override this in your maven config, but it is a (little) pain.  I'm neither
a big fan of this structure nor of maven, but I would suck it up.



On Wed, Jul 18, 2012 at 6:01 PM, Mattmann, Chris A (388J) <> wrote:

> Agree with Sean's assessment below on all points.
> Cheers,
> Chris
> On Jul 17, 2012, at 10:03 AM, Finan, Sean wrote:
> > +1 for a single trunk.
> >
> > In my experience, even if the app is oriented around services and/or
> modules planned point releases of individual products in a single trunk
> does not pose a problem, as you can make a branch of the whole trunk, then
> let those products be developed on that branch where other product source
> etc. is static (or hopefully vice-versa).  This was useful in one case
> where we had code for a database that evolved much more slowly than other
> dependent products.  While it didn't much matter to developers, according
> to our CM keeping everything in one trunk made efforts easier on their
> side.  I took them at their word.  Please note that I am not saying that we
> should or will need to have separate product releases, just that I don't
> think a single trunk should prevent us from doing so.
> >
> > +1 for multiple jars.
> > The matter of single jar vs. multiple jars is not necessarily connected
> to having a single or multiple trunks.
> > I think that separate projects should have separate jar files.  This way
> developers who focus on a single project just need to check out their
> project's source and jars for each dependency.  Integration should build
> each project in a top-down fashion and if a certain project doesn't
> test-out or build properly then it doesn't get a (new) published jar.  This
> keeps everybody dependent upon that project from being held up the next day
> with a broken build as they can check out the published jar without really
> worrying about whether it is truly new or not, it is a working version.  It
> goes along with the notion of "always shippable", one of those agility
> things.
> >
> > +1 for separate top-level src/   test/   example/   and   resource/
> directories.
> > This question was not explicitly mentioned in this topic, but it does
> have something to do with overall structure and jars (Pei does have src/
> and resource/ in his post).  I like the idea of having one root directory
> (under each project) for source, one for tests, and one for examples.  All
> directories share the same package structure.  I have a few reasons for
> doing this.  The test/ directory keeps my src/ directory from getting
> cluttered with files that are tests and not source, which makes browsing
> (in and out of IDE) faster.  For that matter, it makes for a smaller and
> simpler the source tree than having test/ subdirectories (which seems to be
> a common practice) all over the place.  The example/ directory also keeps
> source directories from becoming cluttered, and for anybody new to the code
> base it can make finding decent examples for what they want easier and
> faster.  In addition, it keeps the source code from having long main()
> methods (which also seems to be a common practice) and other methods that
> are necessary for examples but not the purpose of the class.  Having
> examples in an example/ directory also makes it obvious to a new developer
> that they are examples and not old (non junit) tests (which, btw we need to
> extract).  I also have a separate resource/ root directory (such as in the
> original post), which reduces clutter and makes browsing easier etc.
>  Another thing that these separate root directories make possible is
> lighter jar files.  One can build and test everything, but publish a jar
> with just the src/.  That makes dependency updates faster for people that
> don't need the code.  cTakes isn't that big, but it is something to keep in
> mind.  A very minor point is that people should regularly be checking in
> tests (and a few examples).  With all the code in one src/ root, it is
> difficult to notice whether or not somebody is being responsible in this
> regard.  However, it is very simple to survey at a glance root directories
> with large checkins and see if anything is in test/.  If there are a dozen
> new classes checked into src/ and nothing into test/ then the committer
> might need a friendly reminder to write tests for the new code.  For that
> matter, if a project starts to look src/ heavy and test/ light (easy to
> see), then we can try to schedule a test-writing iteration.  Once again
> I'll agree that writing tests can be a pain.  However, it does make things
> easier in the long run, especially in projects with multiple developers who
> come and go.  One last note on this is that sometimes there is a structure
> such as src/main/ &  src/test/.  I don't like this because it adds an
> unnecessary level to the tree.
> >
> > +1 for top level separation of code in different languages.
> > I don't like structures like src/main/java/ & src/main/cpp/.  If there
> is code in two languages, then that differentiation should be made at a
> higher level, such as java/src/ & cpp/src/ (plus java/test/ & cpp/test/,
> etc.).  That way if I work only on Java code I can still check out a src/
> directory, and don't need to check out something silly like java/ without
> an src/ because the src/ is a level or two up and includes source in other
> languages that I don't want.  If I do check out all of src/, and even if
> the cpp/ branch never changes, my sandbox is still muddied up with extra
> files that I don't need.  The cpp/ (or whatever) should be a separately
> built resource that I don't need to build myself but can check out on a
> daily basis.
> >
> > +1 to separate roots for each major (sub)project under one trunk.
> > This goes somewhat hand-in-hand with single vs. multiple jars, so maybe
> I'm being redundant.  I don't think that there is any controversy, but I
> want to put it here for posterity and just in case anybody has a better
> idea.  Currently we've got major projects within cTakes like core, the gui,
> etc.  It may be rare for any developer to work on more than one project at
> a time (or ever), so they probably don't want to check out mixed code for
> all projects - just code for their project and published jars for
> dependencies.
> >
> > +1 to a single common package structure.
> > I probably shouldn't need to say this, but our current code base has
> this problem so I will.  Different projects (with separated locations)
> should have a common package structure.  In other words, project A should
> not have package structure org.apache.ctakes.A.annotation.* while package B
> has org.apache.cTakes.B.frog.leg.annotation.*  I would prefer that,
> whichever structure is formulated first wins.  If project A made its
> structure first, then project B should endeavor to follow its lead with
> something like org.apache.cTakes.B.annotation.frog.leg.*  While this may
> seem like it is completely unnecessary, it really does (imho) make keeping
> things straight in my mind a lot easier when I work in/on multiple
> projects.  Plus, if there are dependencies it looks crazy when include
> statements don't follow a single structure.  It gets really bad (we've all
> seen this) when a single project or code base has multiple packages with
> the same name at different levels of a single tree.  For instance, it makes
> sense to have A.frog.leg.* and A.toad.leg.*, but A.frog.leg and
> A.toad.appendage.leg is a strong candidate for refactoring, on one side or
> the other.  Even worse is A.frog.leg and A.frog.appendage.leg.  If packages
> in different projects or the same project have the same name but do not
> have anything to do with each other then they probably should not have the
> same name, regardless of what level they occupy in the tree.
> >
> > Ok, I think that is all that I've got for the 30,000 ft. structure.
> >
> > Cheers,
> > Sean
> >
> > -----Original Message-----
> > From: Mattmann, Chris A (388J) []
> > Sent: Monday, July 16, 2012 7:30 PM
> > To: <>
> > Subject: Re: SVN source structure for Apache cTAKES?
> >
> > +1 to having a shared trunk. In Apache OODT, we tried to separate them
> > (and prior to bringing the software to Apache did so at JPL), however we
> found that folks want a fully compatible Apache release, including
> compatible versions of the sub components. See OODT-15 [1] for our
> discussion and decision to keep it as 1 trunk.
> >
> > Cheers,
> > Chris
> >
> > [1]
> >
> > On Jul 16, 2012, at 3:29 PM, Chen, Pei wrote:
> >
> >>
> >> 605#comment-13415605 how should the new SVN structure look like for
> >> Apache cTAKES?
> >>
> >> Currently in SF, it looks like:
> >> {cTAKES-root}
> >> /branches
> >> /tags
> >> /trunk
> >> -/cTAKES
> >>  -/core
> >>    /src
> >>    /desc
> >> -/chunker
> >> -/coref-resolver
> >> Etc..
> >> Which means that all of those projects are all children of trunk and
> will share the same release cycle.
> >>
> >> One alternative option looks something like (each component could have
> it's own trunk/jar file?):
> >> {cTAKES-root}
> >> -/ctakes-core
> >> /trunk
> >>  /src
> >>   /java
> >>    /main
> >> /resources
> >> /branches
> >> /tags
> >> -/ctakes-chunker
> >> /trunk
> >>  /src
> >>   /java
> >>    /main
> >> /resources
> >> /branches
> >> /tags
> >> -/ctakes-coreference
> >> /trunk
> >>  /src
> >>   /java
> >>    /main
> >> /resources
> >> /branches
> >> /tags
> >>
> >> There are pro's and con's to both, but let's get the discussion started
> as this will be required for the code migration.
> >>
> >
> >
