incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: SVN source structure for Apache cTAKES?
Date Tue, 17 Jul 2012 15:52:54 GMT
One additional discussion item for the /resources folder:
If there are reference data/models etc. that is not tied to a specific release, should we
treat it as its own component in the top level?
The size of these models/lookup dictionaries, etc. could reach TB's...

--Pei

-----Original Message-----
From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu] 
Sent: Tuesday, July 17, 2012 11:30 AM
To: ctakes-dev@incubator.apache.org
Subject: RE: SVN source structure for Apache cTAKES?

I think this is starting to look like opennlp's src structure:
https://svn.apache.org/repos/asf/opennlp/trunk/

If we do not need to keep separate release cycles, then we can most likely get away with a
single trunk.  In the future, if there are aux components that require their own release schedule,
we can always manage separately in their own separate project and SVN...

+1 on keeping separate jars similar to opennlp.

+1 for standard package structure and naming conventions and also lowercasing all directory
names and removing spaces :).

-1 On src/main/java though-  If we decide to use Maven as the build utility, then I would
suggest keeping the extra level src/main/java.  This is solely for the ease of integration/use
of Maven.  Maven's defaults and plugin's usually expect the tree to be in that structure.
 I think it actually becomes more work to customize everything to be outside of that structure.

--Pei

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Tuesday, July 17, 2012 11:04 AM
To: ctakes-dev@incubator.apache.org
Subject: RE: SVN source structure for Apache cTAKES?

+1 for a single trunk.

In my experience, even if the app is oriented around services and/or modules planned point
releases of individual products in a single trunk does not pose a problem, as you can make
a branch of the whole trunk, then let those products be developed on that branch where other
product source etc. is static (or hopefully vice-versa).  This was useful in one case where
we had code for a database that evolved much more slowly than other dependent products.  While
it didn't much matter to developers, according to our CM keeping everything in one trunk made
efforts easier on their side.  I took them at their word.  Please note that I am not saying
that we should or will need to have separate product releases, just that I don't think a single
trunk should prevent us from doing so.

+1 for multiple jars.
The matter of single jar vs. multiple jars is not necessarily connected to having a single
or multiple trunks.
I think that separate projects should have separate jar files.  This way developers who focus
on a single project just need to check out their project's source and jars for each dependency.
 Integration should build each project in a top-down fashion and if a certain project doesn't
test-out or build properly then it doesn't get a (new) published jar.  This keeps everybody
dependent upon that project from being held up the next day with a broken build as they can
check out the published jar without really worrying about whether it is truly new or not,
it is a working version.  It goes along with the notion of "always shippable", one of those
agility things.

+1 for separate top-level src/   test/   example/   and   resource/   directories.
This question was not explicitly mentioned in this topic, but it does have something to do
with overall structure and jars (Pei does have src/ and resource/ in his post).  I like the
idea of having one root directory (under each project) for source, one for tests, and one
for examples.  All directories share the same package structure.  I have a few reasons for
doing this.  The test/ directory keeps my src/ directory from getting cluttered with files
that are tests and not source, which makes browsing (in and out of IDE) faster.  For that
matter, it makes for a smaller and simpler the source tree than having test/ subdirectories
(which seems to be a common practice) all over the place.  The example/ directory also keeps
source directories from becoming cluttered, and for anybody new to the code base it can make
finding decent examples for what they want easier and faster.  In addition, it keeps the source
code from having long main() methods (which also seems to be a common practice) and other
methods that are necessary for examples but not the purpose of the class.  Having examples
in an example/ directory also makes it obvious to a new developer that they are examples and
not old (non junit) tests (which, btw we need to extract).  I also have a separate resource/
root directory (such as in the original post), which reduces clutter and makes browsing easier
etc.  Another thing that these separate root directories make possible is lighter jar files.
 One can build and test everything, but publish a jar with just the src/.  That makes dependency
updates faster for people that don't need the code.  cTakes isn't that big, but it is something
to keep in mind.  A very minor point is that people should regularly be checking in tests
(and a few examples).  With all the code in one src/ root, it is difficult to notice whether
or not somebody is being responsible in this regard.  However, it is very simple to survey
at a glance root directories with large checkins and see if anything is in test/.  If there
are a dozen new classes checked into src/ and nothing into test/ then the committer might
need a friendly reminder to write tests for the new code.  For that matter, if a project starts
to look src/ heavy and test/ light (easy to see), then we can try to schedule a test-writing
iteration.  Once again I'll agree that writing tests can be a pain.  However, it does make
things easier in the long run, especially in projects with multiple developers who come and
go.  One last note on this is that sometimes there is a structure such as src/main/ &
 src/test/.  I don't like this because it adds an unnecessary level to the tree. 

+1 for top level separation of code in different languages.
 I don't like structures like src/main/java/ & src/main/cpp/.  If there is code in two
languages, then that differentiation should be made at a higher level, such as java/src/ &
cpp/src/ (plus java/test/ & cpp/test/, etc.).  That way if I work only on Java code I
can still check out a src/ directory, and don't need to check out something silly like java/
without an src/ because the src/ is a level or two up and includes source in other languages
that I don't want.  If I do check out all of src/, and even if the cpp/ branch never changes,
my sandbox is still muddied up with extra files that I don't need.  The cpp/ (or whatever)
should be a separately built resource that I don't need to build myself but can check out
on a daily basis.

+1 to separate roots for each major (sub)project under one trunk.
This goes somewhat hand-in-hand with single vs. multiple jars, so maybe I'm being redundant.
 I don't think that there is any controversy, but I want to put it here for posterity and
just in case anybody has a better idea.  Currently we've got major projects within cTakes
like core, the gui, etc.  It may be rare for any developer to work on more than one project
at a time (or ever), so they probably don't want to check out mixed code for all projects
- just code for their project and published jars for dependencies.

+1 to a single common package structure.
I probably shouldn't need to say this, but our current code base has this problem so I will.
 Different projects (with separated locations) should have a common package structure.  In
other words, project A should not have package structure org.apache.ctakes.A.annotation.*
while package B has org.apache.cTakes.B.frog.leg.annotation.*  I would prefer that, whichever
structure is formulated first wins.  If project A made its structure first, then project B
should endeavor to follow its lead with something like org.apache.cTakes.B.annotation.frog.leg.*
 While this may seem like it is completely unnecessary, it really does (imho) make keeping
things straight in my mind a lot easier when I work in/on multiple projects.  Plus, if there
are dependencies it looks crazy when include statements don't follow a single structure. 
It gets really bad (we've all seen this) when a single project or code base has multiple packages
with the same name at different levels of a single tree.  For instance, it makes sense to
have A.frog.leg.* and A.toad.leg.*, but A.frog.leg and A.toad.appendage.leg is a strong candidate
for refactoring, on one side or the other.  Even worse is A.frog.leg and A.frog.appendage.leg.
 If packages in different projects or the same project have the same name but do not have
anything to do with each other then they probably should not have the same name, regardless
of what level they occupy in the tree.  

Ok, I think that is all that I've got for the 30,000 ft. structure.

Cheers,
Sean

-----Original Message-----
From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Monday, July 16, 2012 7:30 PM
To: <ctakes-dev@incubator.apache.org>
Subject: Re: SVN source structure for Apache cTAKES?

+1 to having a shared trunk. In Apache OODT, we tried to separate them
(and prior to bringing the software to Apache did so at JPL), however we found that folks
want a fully compatible Apache release, including compatible versions of the sub components.
See OODT-15 [1] for our discussion and decision to keep it as 1 trunk.

Cheers,
Chris

[1] https://issues.apache.org/jira/browse/OODT-15

On Jul 16, 2012, at 3:29 PM, Chen, Pei wrote:

> https://issues.apache.org/jira/browse/CTAKES-10?focusedCommentId=13415
> 605#comment-13415605 how should the new SVN structure look like for 
> Apache cTAKES?
> 
> Currently in SF, it looks like:
> {cTAKES-root}
> /branches
> /tags
> /trunk
> -/cTAKES
>   -/core
>     /src
>     /desc
>  -/chunker
>  -/coref-resolver
>  Etc..
> Which means that all of those projects are all children of trunk and will share the same
release cycle.
> 
> One alternative option looks something like (each component could have it's own trunk/jar
file?):
> {cTAKES-root}
> -/ctakes-core
>  /trunk
>   /src
>    /java
>     /main
>  /resources
>  /branches
>  /tags
> -/ctakes-chunker
>  /trunk
>   /src
>    /java
>     /main
>  /resources
>  /branches
>  /tags
> -/ctakes-coreference
>  /trunk
>   /src
>    /java
>     /main
>  /resources
>  /branches
>  /tags
> 
> There are pro's and con's to both, but let's get the discussion started as this will
be required for the code migration.
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of Southern California,
Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message