incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
Date Fri, 25 Jan 2013 15:41:31 GMT
Based on the ongoing discussions,
Could I suggest we cancel the VOTE on RC5 and create an RC6?
RC6 will be an extremely conservative- 
- No resources (models) included in src/main/java
- No resources (models) included in the -bin.tar.gz
- Move all of the models and resources to a ctakes-models projects within the ctakes-resources
on sourceforge (currently used by the UMLS resources already).
- Update the pom.xml's to download those for developers via maven.
- End-Users will have to download and unzip a ctakes-resources.zip which contains all of the
models and resources (including UMLS).

I believe this is just a temporary measure (at least a decent compromise) until we get clarity
on some of these items.
We can create subsequent releases afterwards such as a single -bin.tar.gz that includes the
models just like any other 3rd party lib, and then possibly including it in src as well.

I do not think this is a "end user friendly" issue, IMHO, it just doesn't makes sense to separate
out parts of software that are an intricate part of the software and are always required to
function properly such as icons, gifs, jpgs, or statistical models in this case (which have
been approved to be released under ASL 2.0 terms by their contributors).  

--Pei


> -----Original Message-----
> From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> Sent: Friday, January 25, 2013 10:11 AM
> To: 'ctakes-dev@incubator.apache.org'
> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> 
> > -----Original Message-----
> > From:
> > ctakes-dev-return-1106-Masanz.James=mayo.edu@incubator.apache.org
> > [mailto:ctakes-dev-return-1106-
> Masanz.James=mayo.edu@incubator.apache.
> > org]
> > On Behalf Of Mattmann, Chris A (388J)
> > Sent: Friday, January 25, 2013 2:10 AM
> > To: ctakes-dev@incubator.apache.org
> > Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >
> > Hey James,
> >
> > On 1/24/13 11:55 PM, "Masanz, James J." <Masanz.James@mayo.edu>
> wrote:
> >
> > >I posted on general@incubator that:
> > >
> > >> One goal is to have a binary that contains all resources, which can
> > >> be used to install cTAKES on a system that does not have an
> > >> internet connection.
> > >> For now we can focus on a first Apache release that doesn't meet
> > >> that goal, while pursuing the question with legal.
> > >> If legal says we can't do have that kind of binary here, then in
> > >> the future we can consider if we will host such a binary on a
> > >> different site.
> > >
> > >http://s.apache.org/bgp
> > >
> > >Another motivation for this email is a post by Benson (below) to
> > >general@incubator, where he writes "It's not the mission of the ASF
> > >to create complete, end-user-friendly, software products".
> >
> > Just to clarify -- that's Benson, talking for Roy. :) I realize that
> > this has got all skitzo lately, but just pointing out that this is far
> > from doctrine. Apache OpenOffice is a prime counter example to his
> > point and I just made that point myself.
> >
> > >
> > >I suggest we, or whoever among us are interested in such a thing,
> > >host an easy-to-install *binary* that includes cTAKES plus the models
> > >and jars, somewhere other than apache.org, that would be a single
> > >download with a simple unzip (and would be built off Apache cTAKES
> > >3.0.0-incubating, once it is released).
> >
> > If it comes to this, I'd recommend hosting it at
> > http://apache-extras.org/ which is Google Code, but branded with
> > Apache through a special ComDev agreement set up. Products developed
> there are said to have an "affinity"
> > towards particular Apache products, but not be those Apache products.
> > Apache Extras != Apache, but still is an option for those parts.
> >
> > >
> > >This binary would probably be released shortly after each Apache
> > >cTAKES release, so it could be built from the officially released
> > >Apache cTAKES source.
> >
> > Yep. I don't think the battle is over there yet though -- I liked your
> > suggestion however -- let's just roll a source release, and try to
> > push the convenience binaries as needed.
> >
> > >
> > >From my understanding, we cannot have models in SVN here if they were
> > >built from data that is not available to the community since the
> > >models are not "source". That's based on this specific comment within
> LEGAL-157:
> > >https://issues.apache.org/jira/browse/LEGAL-
> 157?focusedCommentId=1356
> > >10
> > >92&
> > >page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel
> > >#c
> > >omm
> > >ent-13561092
> >
> > That's Benson's opinion, note Roy hasn't replied to him. I don't read
> > Roy's reading on the subject to be that we can't include those
> > intermediate outputs? Do you?
> 
> Yes, that's the way I reading Roy's post - that it can't include models
> (intermediate outputs) because the source for those intermediate outputs is
> not being included.
> 
> > >We also cannot have other compiled jars in our SVN here at
> > >apache.org, and therefore cannot be in our source release, which we
> > >are working on addressing
> >
> > That's not recommended, but also not an absolute blocker and can be
> > improved incrementally. Prior versions of Apache Lucene (and anything
> > built from Ant) had this issue and those releases shipped just fine.
> 
> That's great to know. Thanks.
> 
> > >
> > >For people checking out code from SVN and using maven, those are not
> > >such big issues since maven will fetch the dependencies once we
> > >finish updating the POMs etc.
> > >
> > >If we want to allow people to download a single binary and get the
> > >cTAKES code and the models, it sounds like we either need to
> > >1) write something that would download the models for the users
> > >2) or host the binaries elsewhere
> > >(or require users to download things separately and put them together).
> >
> > I would highly suggest #1 to avoid fragmentation.
> >
> > >
> > >I strongly dislike option 1, so I will focus on option 2 in this
> > >email, as that will be more than enough for one email any way ;)
> >
> > Why don't you like option #1? Just curious.
> 
> Two reasons - a goal is to have an install that is as simple as possible to reduce
> barriers for (very busy) people to give cTAKES a try. (There will be times
> when downloading models of 100s of MB will fail for one reason or another
> on the first attempt.)
> 
> And secondly, the personal experience I've had with writing (commercial)
> install code, which very often turned into a vastly more difficult and time
> consuming (testing-wise) task than people would allow for, and also resulted
> in more enduser questions than anticipated. Which leads to an admittedly
> personal bias against such things, if they can be avoided. But I mentioned #1
> because I know my views on #2 are partially a personal bias.
> 
> > >For people to host such an all-inclusive binary elsewhere, those
> > >people would need to choose a name.
> > >We could create a logo for their use, something like "Apache cTAKES
> > >inside" or  "Powered by Apache cTAKES" (see
> > >http://www.apache.org/foundation/marks/pmcs.html#poweredby) and
> make
> > >it clear the binary is not being released directly by Apache
> > >http://s.apache.org/BAj
> > >
> > >I suggest that we wouldn't need to create a convenience binary here
> > >at Apache - one less thing to test and document.
> > >
> > >This would bring up several questions though, which I'm guessing we
> > >don't want to get into here in great detail since it is really about
> > >something that is not to be released directly from Apache.
> > > - what to call the binary (we would not simply be able to call it
> > >"Apache cTAKES")
> > > - where to host the binary (I'd suggest the ohnlp sourceforge
> > >project, where previous versions of cTAKES live)
> > > - we would need a place to hold the documentation for this binary. I
> > >am assuming we could not host it as apache.org, but we would need
> > >that either confirmed here or create a legal Jira to get that confirmation.
> > > - where would we tell people to go to post questions about the binary?
> > > - where would the build of the binary take place
> > >
> > >I suggest taking those questions offline unless someone tells me
> > >those things are indeed OK to discuss here.
> > >
> > >My main point to discuss here is whether there is enough value in
> > >providing a convenience binary of Apache cTAKES here at apache.org
> > >(which would not contain the models) for us to create and support it
> > >here, or if we skip creating binary here at apache.org and only
> > >create source packages here.
> > >
> > >I am not trying to splinter the group here. I would hope anyone
> > >involved in producing the binary would be involved here with Apache
> > cTAKES too.
> > >But there might be people involved in Apache cTAKES that aren't
> > >interested in the details of how a binary is produced or what it
> > >looks like, or even if it is produced.
> >
> > That's a possibility but brings with a whole horde of other legal
> > mumbo jumbo (and trademarks@) that trust me you don't want to go
> down (yet).
> > Maybe ever :)
> >
> > Try and focus on #1 -- I bet it's achievable without all the
> > convenience binaries part. Would that work for the community?
> 
> We have previously (before Apache) received lots of positive end user
> feedback about what an improvement providing an all-inclusive binary was
> for them.
> Not providing it is a step backward for us.
> 
> > Cheers,
> > Chris
> > >
> > >-- James
> > >
> 
> -- James
> 
> > >> -----Original Message-----
> > >> From:
> > >> general-return-39392-Masanz.James=mayo.edu@incubator.apache.org
> > >> [mailto:general-return-39392-
> Masanz.James=mayo.edu@incubator.apache
> > >> .o
> > >> rg]
> > >> On Behalf Of Benson Margulies
> > >> Sent: Thursday, January 24, 2013 9:23 PM
> > >> To: general@incubator.apache.org
> > >> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> > >>
> > >> It's unfortunate to have this conversation in parallel here and on
> > >> https://issues.apache.org/jira/browse/LEGAL-157.
> > >>
> > >> Also, this thread is a combo of the discussion of ordinary
> > >>jars-of-classes  (where I'd forgotten the policy) and the much more
> > >>tangled question of  models, which is what the JIRA is wrestling with.
> > >>
> > >> To answer Ted, I think that Roy might write something like:
> > >>
> > >> "It's not the mission of the ASF to create complete,
> > >>end-user-friendly,  software products. It's our mission to create
> > >>open source code. If someone  else wants to build up an
> > >>end-user-friendly aggregation of ASF code and  models from bombs of
> > >>whatever, that's great, and we encourage them."
> > >>
> > >> On Thu, Jan 24, 2013 at 8:19 PM, Branko Čibej <brane@apache.org>
> wrote:
> > >> > On 25.01.2013 01:50, Ted Dunning wrote:
> > >> >> On Fri, Jan 25, 2013 at 7:37 AM, Branko Čibej <brane@apache.org>
> > >>wrote:
> > >> >>
> > >> >>> On 21.01.2013 21:08, Benson Margulies wrote:
> > >> >>> ...>>
> > >> >>>>> I am referring to this discussion  http://s.apache.org/MUZ
> > >> >>>> Well, that clear enough, even if it is a typical example
of
> > >> >>>> how our founders yell at us but we have no mechanism to
> > >> >>>> channel those yells into concise, unambiguous, documentation.
> > >> >>> Per haps off-topic ... but I fail to see how "source release"
> > >> >>> is ambiguous or not concise.
> > >> >>>
> > >> >>> Unless the Java world has a different definition of "source
code"
> > >> >>> than us stuck-in-the-mud plodders, and it's only considered
> > >> >>> binary once it's been JIT-compiled. :)
> > >> >>>
> > >> >>
> > >> >> It isn't necessarily ambiguous when applied to code, but there
> > >> >> is a different case when applied to models  or parameter settings.
> > >> >>
> > >> >> For instance, commons match has polynomial coefficients embedded
> > >> >> in code that approximate certain functions.  These are the
> > >> >> results of computations done using other systems and the source
> > >> >> code and the data used in those other computations are not
> > >> >> included in the released code, only the parameter values are.
> > >> >>
> > >> >> This same sort of thing applies here except that the model in
> > >> >> question has a much larger set of values and is being packaged
> > >> >> in a binary, inspectable format.  Would your opinion change if
> > >> >> the model were expressed in a textual model?  Would it matter
> > >> >> that the textual model is too large and obtuse to usefully inspect?
> > >> >
> > >> > In cases like this one, it would seem reasonable for the source
> > >> > code to refer to those models and computations, which presumably
> > >> > anyone can then reproduce to their own satisfaction. This is
> > >> > unlike compiled code in that compilation results are notoriously
> > >> > hard to reproduce exactly, because they depend on many factors
> > >> > that are usually hard to document, let alone reproduce. I'd
> > >> > expect a mathematical model, no matter how large, does not suffer
> > >> > from such
> > ambiguities (and shut up, Gödel).
> > >> >
> > >> > However, that's beside the point, because ...
> > >> >
> > >> >> What about a hypothetical case where the model is derived from
> > >> >> the explosion of a nuclear bomb?  Would the release of the
> > >> >> numbers require the inclusion of a suitable bomb design so that
> > >> >> everybody could replicate the derivation?
> > >> >
> > >> > ... the issue is not about the exposing all the knowledge that
> > >> > goes into writing the code, but to expose the code itself so that
> > >> > it can be reviewed for, e.g., back-doors and other security issues.
> > >> > Neither of your examples is relevant.
> > >> >
> > >> > -- Brane
> > >> >
> 


Mime
View raw message