incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
Date Fri, 25 Jan 2013 17:35:21 GMT
> -----Original Message-----
> From: ctakes-dev-return-1113-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1113-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Chen, Pei
> Sent: Friday, January 25, 2013 9:42 AM
> To: ctakes-dev@incubator.apache.org
> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> Based on the ongoing discussions,
> Could I suggest we cancel the VOTE on RC5 and create an RC6?

+1  to that

> RC6 will be an extremely conservative-

Again +1 to that

> - No resources (models) included in src/main/java
> - No resources (models) included in the -bin.tar.gz
> - Move all of the models and resources to a ctakes-models projects within
> the ctakes-resources on sourceforge (currently used by the UMLS resources
> already).
> - Update the pom.xml's to download those for developers via maven.
> - End-Users will have to download and unzip a ctakes-resources.zip which
> contains all of the models and resources (including UMLS).

All sound good as a compromise for this release to me.

> I believe this is just a temporary measure (at least a decent compromise)
> until we get clarity on some of these items.
> We can create subsequent releases afterwards such as a single -bin.tar.gz
> that includes the models just like any other 3rd party lib, 

The models are dependent on the outcome of
https://issues.apache.org/jira/browse/LEGAL-157


Regards, 
James Masanz

> and then possibly including it in src as well.
> 
> I do not think this is a "end user friendly" issue, IMHO, it just doesn't
> makes sense to separate out parts of software that are an intricate part
> of the software and are always required to function properly such as
> icons, gifs, jpgs, or statistical models in this case (which have been
> approved to be released under ASL 2.0 terms by their contributors).
> 
> --Pei
> 
> 
> > -----Original Message-----
> > From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> > Sent: Friday, January 25, 2013 10:11 AM
> > To: 'ctakes-dev@incubator.apache.org'
> > Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >
> >
> > > -----Original Message-----
> > > From:
> > > ctakes-dev-return-1106-Masanz.James=mayo.edu@incubator.apache.org
> > > [mailto:ctakes-dev-return-1106-
> > Masanz.James=mayo.edu@incubator.apache.
> > > org]
> > > On Behalf Of Mattmann, Chris A (388J)
> > > Sent: Friday, January 25, 2013 2:10 AM
> > > To: ctakes-dev@incubator.apache.org
> > > Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> > >
> > > Hey James,
> > >
> > > On 1/24/13 11:55 PM, "Masanz, James J." <Masanz.James@mayo.edu>
> > wrote:
> > >
> > > >I posted on general@incubator that:
> > > >
> > > >> One goal is to have a binary that contains all resources, which
> > > >> can be used to install cTAKES on a system that does not have an
> > > >> internet connection.
> > > >> For now we can focus on a first Apache release that doesn't meet
> > > >> that goal, while pursuing the question with legal.
> > > >> If legal says we can't do have that kind of binary here, then in
> > > >> the future we can consider if we will host such a binary on a
> > > >> different site.
> > > >
> > > >http://s.apache.org/bgp
> > > >
> > > >Another motivation for this email is a post by Benson (below) to
> > > >general@incubator, where he writes "It's not the mission of the ASF
> > > >to create complete, end-user-friendly, software products".
> > >
> > > Just to clarify -- that's Benson, talking for Roy. :) I realize that
> > > this has got all skitzo lately, but just pointing out that this is
> > > far from doctrine. Apache OpenOffice is a prime counter example to
> > > his point and I just made that point myself.
> > >
> > > >
> > > >I suggest we, or whoever among us are interested in such a thing,
> > > >host an easy-to-install *binary* that includes cTAKES plus the
> > > >models and jars, somewhere other than apache.org, that would be a
> > > >single download with a simple unzip (and would be built off Apache
> > > >cTAKES 3.0.0-incubating, once it is released).
> > >
> > > If it comes to this, I'd recommend hosting it at
> > > http://apache-extras.org/ which is Google Code, but branded with
> > > Apache through a special ComDev agreement set up. Products developed
> > there are said to have an "affinity"
> > > towards particular Apache products, but not be those Apache products.
> > > Apache Extras != Apache, but still is an option for those parts.
> > >
> > > >
> > > >This binary would probably be released shortly after each Apache
> > > >cTAKES release, so it could be built from the officially released
> > > >Apache cTAKES source.
> > >
> > > Yep. I don't think the battle is over there yet though -- I liked
> > > your suggestion however -- let's just roll a source release, and try
> > > to push the convenience binaries as needed.
> > >
> > > >
> > > >From my understanding, we cannot have models in SVN here if they
> > > >were built from data that is not available to the community since
> > > >the models are not "source". That's based on this specific comment
> > > >within
> > LEGAL-157:
> > > >https://issues.apache.org/jira/browse/LEGAL-
> > 157?focusedCommentId=1356
> > > >10
> > > >92&
> > > >page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan
> > > >el
> > > >#c
> > > >omm
> > > >ent-13561092
> > >
> > > That's Benson's opinion, note Roy hasn't replied to him. I don't
> > > read Roy's reading on the subject to be that we can't include those
> > > intermediate outputs? Do you?
> >
> > Yes, that's the way I reading Roy's post - that it can't include
> > models (intermediate outputs) because the source for those
> > intermediate outputs is not being included.
> >
> > > >We also cannot have other compiled jars in our SVN here at
> > > >apache.org, and therefore cannot be in our source release, which we
> > > >are working on addressing
> > >
> > > That's not recommended, but also not an absolute blocker and can be
> > > improved incrementally. Prior versions of Apache Lucene (and
> > > anything built from Ant) had this issue and those releases shipped
> just fine.
> >
> > That's great to know. Thanks.
> >
> > > >
> > > >For people checking out code from SVN and using maven, those are
> > > >not such big issues since maven will fetch the dependencies once we
> > > >finish updating the POMs etc.
> > > >
> > > >If we want to allow people to download a single binary and get the
> > > >cTAKES code and the models, it sounds like we either need to
> > > >1) write something that would download the models for the users
> > > >2) or host the binaries elsewhere
> > > >(or require users to download things separately and put them
> together).
> > >
> > > I would highly suggest #1 to avoid fragmentation.
> > >
> > > >
> > > >I strongly dislike option 1, so I will focus on option 2 in this
> > > >email, as that will be more than enough for one email any way ;)
> > >
> > > Why don't you like option #1? Just curious.
> >
> > Two reasons - a goal is to have an install that is as simple as
> > possible to reduce barriers for (very busy) people to give cTAKES a
> > try. (There will be times when downloading models of 100s of MB will
> > fail for one reason or another on the first attempt.)
> >
> > And secondly, the personal experience I've had with writing
> > (commercial) install code, which very often turned into a vastly more
> > difficult and time consuming (testing-wise) task than people would
> > allow for, and also resulted in more enduser questions than
> > anticipated. Which leads to an admittedly personal bias against such
> > things, if they can be avoided. But I mentioned #1 because I know my
> views on #2 are partially a personal bias.
> >
> > > >For people to host such an all-inclusive binary elsewhere, those
> > > >people would need to choose a name.
> > > >We could create a logo for their use, something like "Apache cTAKES
> > > >inside" or  "Powered by Apache cTAKES" (see
> > > >http://www.apache.org/foundation/marks/pmcs.html#poweredby) and
> > make
> > > >it clear the binary is not being released directly by Apache
> > > >http://s.apache.org/BAj
> > > >
> > > >I suggest that we wouldn't need to create a convenience binary here
> > > >at Apache - one less thing to test and document.
> > > >
> > > >This would bring up several questions though, which I'm guessing we
> > > >don't want to get into here in great detail since it is really
> > > >about something that is not to be released directly from Apache.
> > > > - what to call the binary (we would not simply be able to call it
> > > >"Apache cTAKES")
> > > > - where to host the binary (I'd suggest the ohnlp sourceforge
> > > >project, where previous versions of cTAKES live)
> > > > - we would need a place to hold the documentation for this binary.
> > > >I am assuming we could not host it as apache.org, but we would need
> > > >that either confirmed here or create a legal Jira to get that
> confirmation.
> > > > - where would we tell people to go to post questions about the
> binary?
> > > > - where would the build of the binary take place
> > > >
> > > >I suggest taking those questions offline unless someone tells me
> > > >those things are indeed OK to discuss here.
> > > >
> > > >My main point to discuss here is whether there is enough value in
> > > >providing a convenience binary of Apache cTAKES here at apache.org
> > > >(which would not contain the models) for us to create and support
> > > >it here, or if we skip creating binary here at apache.org and only
> > > >create source packages here.
> > > >
> > > >I am not trying to splinter the group here. I would hope anyone
> > > >involved in producing the binary would be involved here with Apache
> > > cTAKES too.
> > > >But there might be people involved in Apache cTAKES that aren't
> > > >interested in the details of how a binary is produced or what it
> > > >looks like, or even if it is produced.
> > >
> > > That's a possibility but brings with a whole horde of other legal
> > > mumbo jumbo (and trademarks@) that trust me you don't want to go
> > down (yet).
> > > Maybe ever :)
> > >
> > > Try and focus on #1 -- I bet it's achievable without all the
> > > convenience binaries part. Would that work for the community?
> >
> > We have previously (before Apache) received lots of positive end user
> > feedback about what an improvement providing an all-inclusive binary
> > was for them.
> > Not providing it is a step backward for us.
> >
> > > Cheers,
> > > Chris
> > > >
> > > >-- James
> > > >
> >
> > -- James
> >
> > > >> -----Original Message-----
> > > >> From:
> > > >> general-return-39392-Masanz.James=mayo.edu@incubator.apache.org
> > > >> [mailto:general-return-39392-
> > Masanz.James=mayo.edu@incubator.apache
> > > >> .o
> > > >> rg]
> > > >> On Behalf Of Benson Margulies
> > > >> Sent: Thursday, January 24, 2013 9:23 PM
> > > >> To: general@incubator.apache.org
> > > >> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> > > >>
> > > >> It's unfortunate to have this conversation in parallel here and
> > > >> on https://issues.apache.org/jira/browse/LEGAL-157.
> > > >>
> > > >> Also, this thread is a combo of the discussion of ordinary
> > > >>jars-of-classes  (where I'd forgotten the policy) and the much
> > > >>more tangled question of  models, which is what the JIRA is
> wrestling with.
> > > >>
> > > >> To answer Ted, I think that Roy might write something like:
> > > >>
> > > >> "It's not the mission of the ASF to create complete,
> > > >>end-user-friendly,  software products. It's our mission to create
> > > >>open source code. If someone  else wants to build up an
> > > >>end-user-friendly aggregation of ASF code and  models from bombs
> > > >>of whatever, that's great, and we encourage them."
> > > >>
> > > >> On Thu, Jan 24, 2013 at 8:19 PM, Branko Čibej <brane@apache.org>
> > wrote:
> > > >> > On 25.01.2013 01:50, Ted Dunning wrote:
> > > >> >> On Fri, Jan 25, 2013 at 7:37 AM, Branko Čibej
> > > >> >> <brane@apache.org>
> > > >>wrote:
> > > >> >>
> > > >> >>> On 21.01.2013 21:08, Benson Margulies wrote:
> > > >> >>> ...>>
> > > >> >>>>> I am referring to this discussion  http://s.apache.org/MUZ
> > > >> >>>> Well, that clear enough, even if it is a typical
example of
> > > >> >>>> how our founders yell at us but we have no mechanism
to
> > > >> >>>> channel those yells into concise, unambiguous, documentation.
> > > >> >>> Per haps off-topic ... but I fail to see how "source
release"
> > > >> >>> is ambiguous or not concise.
> > > >> >>>
> > > >> >>> Unless the Java world has a different definition of "source
> code"
> > > >> >>> than us stuck-in-the-mud plodders, and it's only considered
> > > >> >>> binary once it's been JIT-compiled. :)
> > > >> >>>
> > > >> >>
> > > >> >> It isn't necessarily ambiguous when applied to code, but
there
> > > >> >> is a different case when applied to models  or parameter
> settings.
> > > >> >>
> > > >> >> For instance, commons match has polynomial coefficients
> > > >> >> embedded in code that approximate certain functions.  These
> > > >> >> are the results of computations done using other systems
and
> > > >> >> the source code and the data used in those other computations
> > > >> >> are not included in the released code, only the parameter
values
> are.
> > > >> >>
> > > >> >> This same sort of thing applies here except that the model
in
> > > >> >> question has a much larger set of values and is being packaged
> > > >> >> in a binary, inspectable format.  Would your opinion change
if
> > > >> >> the model were expressed in a textual model?  Would it matter
> > > >> >> that the textual model is too large and obtuse to usefully
> inspect?
> > > >> >
> > > >> > In cases like this one, it would seem reasonable for the source
> > > >> > code to refer to those models and computations, which
> > > >> > presumably anyone can then reproduce to their own satisfaction.
> > > >> > This is unlike compiled code in that compilation results are
> > > >> > notoriously hard to reproduce exactly, because they depend on
> > > >> > many factors that are usually hard to document, let alone
> > > >> > reproduce. I'd expect a mathematical model, no matter how
> > > >> > large, does not suffer from such
> > > ambiguities (and shut up, Gödel).
> > > >> >
> > > >> > However, that's beside the point, because ...
> > > >> >
> > > >> >> What about a hypothetical case where the model is derived
from
> > > >> >> the explosion of a nuclear bomb?  Would the release of the
> > > >> >> numbers require the inclusion of a suitable bomb design so
> > > >> >> that everybody could replicate the derivation?
> > > >> >
> > > >> > ... the issue is not about the exposing all the knowledge that
> > > >> > goes into writing the code, but to expose the code itself so
> > > >> > that it can be reviewed for, e.g., back-doors and other security
> issues.
> > > >> > Neither of your examples is relevant.
> > > >> >
> > > >> > -- Brane
> > > >> >
> >


Mime
View raw message