incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <>
Subject [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
Date Fri, 25 Jan 2013 07:55:55 GMT
I posted on general@incubator that:

> One goal is to have a binary that contains all resources, 
> which can be used to install cTAKES on a system that does
> not have an internet connection.
> For now we can focus on a first Apache release that 
> doesn't meet that goal, while pursuing the question with legal.
> If legal says we can't do have that kind of binary here, 
> then in the future we can consider
> if we will host such a binary on a different site.

Another motivation for this email is a post by Benson (below) to general@incubator, where
he writes "It's not the mission of the ASF to create complete, end-user-friendly, software

I suggest we, or whoever among us are interested in such a thing, host an easy-to-install
*binary* that includes cTAKES plus the models and jars, somewhere other than, that
would be a single download with a simple unzip (and would be built off Apache cTAKES 3.0.0-incubating,
once it is released).

This binary would probably be released shortly after each Apache cTAKES release, so it could
be built from the officially released Apache cTAKES source.

From my understanding, we cannot have models in SVN here if they were built from data that
is not available to the community since the models are not "source". That's based on this
specific comment within LEGAL-157:

We also cannot have other compiled jars in our SVN here at, and therefore cannot
be in our source release, which we are working on addressing

For people checking out code from SVN and using maven, those are not such big issues since
maven will fetch the dependencies once we finish updating the POMs etc.

If we want to allow people to download a single binary and get the cTAKES code and the models,
it sounds like we either need to 
1) write something that would download the models for the users 
2) or host the binaries elsewhere 
(or require users to download things separately and put them together).

I strongly dislike option 1, so I will focus on option 2 in this email, as that will be more
than enough for one email any way ;)

For people to host such an all-inclusive binary elsewhere, those people would need to choose
a name.
We could create a logo for their use, something like "Apache cTAKES inside" or  "Powered by
Apache cTAKES" (see and make it
clear the binary is not being released directly by Apache

I suggest that we wouldn't need to create a convenience binary here at Apache - one less thing
to test and document.

This would bring up several questions though, which I'm guessing we don't want to get into
here in great detail since it is really about something that is not to be released directly
from Apache.
 - what to call the binary (we would not simply be able to call it "Apache cTAKES")
 - where to host the binary (I'd suggest the ohnlp sourceforge project, where previous versions
of cTAKES live)
 - we would need a place to hold the documentation for this binary. I am assuming we could
not host it as, but we would need that either confirmed here or create a legal
Jira to get that confirmation.
 - where would we tell people to go to post questions about the binary? 
 - where would the build of the binary take place 

I suggest taking those questions offline unless someone tells me those things are indeed OK
to discuss here.

My main point to discuss here is whether there is enough value in providing a convenience
binary of Apache cTAKES here at (which would not contain the models) for us to
create and support it here, or if we skip creating binary here at and only create
source packages here.

I am not trying to splinter the group here. I would hope anyone involved in producing the
binary would be involved here with Apache cTAKES too. But there might be people involved in
Apache cTAKES that aren't interested in the details of how a binary is produced or what it
looks like, or even if it is produced.

-- James

> -----Original Message-----
> From:
> []
> On Behalf Of Benson Margulies
> Sent: Thursday, January 24, 2013 9:23 PM
> To:
> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> It's unfortunate to have this conversation in parallel here and on
> Also, this thread is a combo of the discussion of ordinary jars-of-classes
> (where I'd forgotten the policy) and the much more tangled question of
> models, which is what the JIRA is wrestling with.
> To answer Ted, I think that Roy might write something like:
> "It's not the mission of the ASF to create complete, end-user-friendly,
> software products. It's our mission to create open source code. If someone
> else wants to build up an end-user-friendly aggregation of ASF code and
> models from bombs of whatever, that's great, and we encourage them."
> On Thu, Jan 24, 2013 at 8:19 PM, Branko Čibej <> wrote:
> > On 25.01.2013 01:50, Ted Dunning wrote:
> >> On Fri, Jan 25, 2013 at 7:37 AM, Branko Čibej <> wrote:
> >>
> >>> On 21.01.2013 21:08, Benson Margulies wrote:
> >>> ...>>
> >>>>> I am referring to this discussion
> >>>> Well, that clear enough, even if it is a typical example of how our
> >>>> founders yell at us but we have no mechanism to channel those yells
> >>>> into concise, unambiguous, documentation.
> >>> Per haps off-topic ... but I fail to see how "source release" is
> >>> ambiguous or not concise.
> >>>
> >>> Unless the Java world has a different definition of "source code"
> >>> than us stuck-in-the-mud plodders, and it's only considered binary
> >>> once it's been JIT-compiled. :)
> >>>
> >>
> >> It isn't necessarily ambiguous when applied to code, but there is a
> >> different case when applied to models  or parameter settings.
> >>
> >> For instance, commons match has polynomial coefficients embedded in
> >> code that approximate certain functions.  These are the results of
> >> computations done using other systems and the source code and the
> >> data used in those other computations are not included in the
> >> released code, only the parameter values are.
> >>
> >> This same sort of thing applies here except that the model in
> >> question has a much larger set of values and is being packaged in a
> >> binary, inspectable format.  Would your opinion change if the model
> >> were expressed in a textual model?  Would it matter that the textual
> >> model is too large and obtuse to usefully inspect?
> >
> > In cases like this one, it would seem reasonable for the source code
> > to refer to those models and computations, which presumably anyone can
> > then reproduce to their own satisfaction. This is unlike compiled code
> > in that compilation results are notoriously hard to reproduce exactly,
> > because they depend on many factors that are usually hard to document,
> > let alone reproduce. I'd expect a mathematical model, no matter how
> > large, does not suffer from such ambiguities (and shut up, Gödel).
> >
> > However, that's beside the point, because ...
> >
> >> What about a hypothetical case where the model is derived from the
> >> explosion of a nuclear bomb?  Would the release of the numbers
> >> require the inclusion of a suitable bomb design so that everybody
> >> could replicate the derivation?
> >
> > ... the issue is not about the exposing all the knowledge that goes
> > into writing the code, but to expose the code itself so that it can be
> > reviewed for, e.g., back-doors and other security issues. Neither of
> > your examples is relevant.
> >
> > -- Brane
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > For additional commands, e-mail:
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message