Return-Path: Delivered-To: apmail-incubator-general-archive@www.apache.org Received: (qmail 47410 invoked from network); 27 Sep 2006 02:15:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 27 Sep 2006 02:15:44 -0000 Received: (qmail 89198 invoked by uid 500); 27 Sep 2006 02:15:42 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 89023 invoked by uid 500); 27 Sep 2006 02:15:42 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 88968 invoked by uid 99); 27 Sep 2006 02:15:42 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Sep 2006 19:15:42 -0700 Authentication-Results: idunn.apache.osuosl.org header.from=otis_gospodnetic@yahoo.com; domainkeys=good Authentication-Results: idunn.apache.osuosl.org smtp.mail=otis_gospodnetic@yahoo.com; spf=permerror X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS Received-SPF: error (idunn.apache.osuosl.org: domain yahoo.com from 206.190.38.59 cause and error) DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 Received: from [206.190.38.59] ([206.190.38.59:43084] helo=web50305.mail.yahoo.com) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id CB/B0-21307-9CED9154 for ; Tue, 26 Sep 2006 19:15:39 -0700 Received: (qmail 69050 invoked by uid 60001); 27 Sep 2006 02:15:34 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=wWn9+F3F+BpieDtm0iL+t5+l8obRcGJBbvXNYfaxCGJ1gjX+bDSWuE2PETi81qPNC8qL6gsYbegBGFqf+vr8DYF28pDAaSfx3+cvn0Lu/X71qXfdHQV3wuSsacFPCKd/wSHh6STYrrcMxeYAABr63sPv4qXja48sdzh+Mek/C4A= ; Message-ID: <20060927021534.69048.qmail@web50305.mail.yahoo.com> Received: from [74.65.202.166] by web50305.mail.yahoo.com via HTTP; Tue, 26 Sep 2006 19:15:34 PDT Date: Tue, 26 Sep 2006 19:15:34 -0700 (PDT) From: Otis Gospodnetic Subject: Re: [Vote] accept UIMA as a podling - #2 To: general@incubator.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Content-Transfer-Encoding: quoted-printable X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [X] +1 Accept UIMA as an Incubator podling=0A[ ] 0 Don't care=0A[ ] -1 Re= ject this proposal for the following reason:=0A=0AOtis=0A=0A----- Original = Message ----=0AFrom: Ian Holsman =0ATo: general@incubato= r.apache.org=0ASent: Tuesday, September 26, 2006 7:17:37 PM=0ASubject: [Vot= e] accept UIMA as a podling - #2=0A=0A=0A=0Aissues addressed in this relea= se:=0A1. updated proposal included=0A2. The first paragraph explains it to = a layperson=0A3. OASIS issue addressed=0A=0A=0A[ ] +1 Accept UIMA as an Inc= ubator podling=0A[ ] 0 Don't care=0A[ ] -1 Reject this proposal for the f= ollowing reason:=0A=0A=0A----8<-------Proposal------8<------=0A=0A=0AHello = everyone -=0A=0AWe are submitting this proposal to the community for a=0Ane= w project in the incubator, and look forward to starting to work with=0Athi= s community.=0A=0AThis is a slightly modified and extended version of the p= roposal that =0Ahas=0Aalready been posted to general@incubator.apache.org.= The whole mail =0Athread=0Acan be found [http://www.nabble.com/Proposal-= for-a-new-incubation- =0Aproject%3A-Unstructured-Information-Management-Arc= hitecture---UIMA- =0Atf2154324.html here].=0A=0AIf you don't feel like read= ing the whole thread, the main question =0Athat came up was:=0Athis is all= very well, but what does it really '''do'''? Attempts to =0Aanswer that = question=0Awhere made [http://www.nabble.com/Re%3A-Proposal-for-a-new-incub= ation- =0Aproject%3A-Unstructured-Information-Management-Architecture---UIM= A- =0Ap5986403.html here] and [http://www.nabble.com/Re%3A-Proposal-for-a- = =0Anew-incubation-project%3A-Unstructured-Information-Management- =0AArchit= ecture---UIMA-p5987788.html here]. We have since worked some =0Aof these = into the proposal itself.=0A=0A----=0A=0A=3D Proposal for Incubation Projec= t: Unstructured Information =0AManagement Architecture - UIMA =3D=0A=0A=3D= =3D Abstract =3D=3D=0A=0AUIMA is a component framework for the analysis of = unstructured =0Acontent such as text, audio and video. It comprises an SD= K and =0Atooling for composing and running analytic components written in = Java =0Aand C++.=0A=0A=0A=3D=3D Proposal: Unstructured Information Manage= ment Architecture =0Aframework =3D=3D=0A=0AUnstructured Information Manage= ment applications are software systems =0Athat analyze large volumes of un= structured information in order to =0Adiscover knowledge that is relevant = to an end user. We propose UIMA, =0Aa framework and SDK for developing su= ch applications. An example UIM =0Aapplication might ingest plain text an= d identify entities, such as =0Apersons, places, organizations; or relatio= ns, such as works-for or =0Alocated-at. UIMA enables such an application = to be decomposed into =0Acomponents, for example ''"language identificatio= n"'' -> ''"language =0Aspecific segmentation"'' -> ''"sentence boundary de= tection"'' -> =0A''"entity detection (person/place names etc.)"''. Each c= omponent =0Amust implement interfaces defined by the framework and must pr= ovide =0Aself-describing metadata via XML descriptor files. The framework= =0Amanages these components and the data flow between them. Components = =0Aare written in Java or C++; the data that flows between components is = =0Adesigned for efficient mapping between these languages. UIMA =0Aadditi= onally provides capabilities to wrap components as network =0Aservices, an= d can scale to very large volumes by replicating =0Aprocessing pipelines o= ver a cluster of networked nodes.=0A=0AThis framework has already attracted= a following among government, =0Acommercial, and academic institutions wh= o previously developed =0Aanalysis algorithms, but were unable to easily b= uild on each other's =0Aworks, and who want to be able to evolve their app= lications by =0Aindependently upgrading parts, as better technology become= s =0Aavailable. Applications built with this framework are being used = =0Awith plain text, audio streams, and image/video streams, identifying = =0Aentities and relations, converting speech to text, translating into =0A= different languages, and determining properties of images.=0A=0AThe UIMA fr= amework runs components in a flow, passing a common data =0Aobject contain= ing unstructured information (free text, audio, video, =0Aetc.) through th= e components. Each component examines the =0Aunstructured information and= data added by other components, and adds =0Adata of its own. The framewo= rk mandates a standardized form of the =0Adata being passed, and a standar= dized form of the interfaces to the =0Acomponents.=0A=0AWe propose a proje= ct to develop, implement, support and enhance this =0Aframework (and, over= time, other implementations) that comply with =0Athe UIMA standard (which= has been submitted for standardization work =0Awithin [http://www.oasis-o= pen.org OASIS]. Members of this community =0Aare encouraged to participat= e in that effort, as well; OASIS has an =0Aopen approach to granting Techn= ical Committee voting rights to =0Amembers of OASIS, described here: http:= //www.oasis-open.org/ =0Acommittees/process.php#2.4.=0A=0AThe proposal incl= udes both the framework, as well as tools to =0Adevelop, describe, compose= and deploy UIMA-based components and =0Aapplications. The initial work wi= ll be based on the UIMA Version 2 =0Aframework code developed by IBM; snap= shots of each release of this =0Acode are currently made available on [htt= p://sourceforge.net/projects/ =0Auima-framework SourceForge]. The Source``F= orge versions would be =0Astabilized in maintenance mode, if we are succes= sful in moving to =0AApache.=0AThe framework is not specific to any IDE or= platform, and does not =0Adepend on other middleware.=0ABackground:=0A=0A= Databases are core components of nearly all applications; they store =0Ain= formation in structured tables. But more and more of the available =0Adig= ital data is unstructured (e.g. email, web documents, images, =0Aaudio cli= ps, video streams) with little information (metadata) =0Aattached to expla= in its content or context. Although many =0Aapplications have been built = to process unstructured data, they have =0Aeither managed it as a BLOB or = they have developed isolated =0Aapplications for analyzing the content. I= n the absence of a =0Astandardized means for analytical applications to sh= are insights =0Aextracted from the content, analytical applications cannot= build upon =0Aone another. As a result, the industry has barely begun to = tap the =0Avalue locked in unstructured information.=0A=0AStandardization = is key to achieving component interoperability, with =0Acapabilities to mi= x components developed in different places and in =0AJava, C++ and other l= anguages. The Unstructured Information =0AManagement Architecture defines= standards for component =0Ainteroperability and application composition t= hat will provide this =0Aneeded unifying standard, and allow a variety of = framework =0Aimplementations to exist, while preserving the goal of unstru= ctured =0Ainformation analytic component reuse.=0A=0AUIMA was built to hel= p developers create solutions that get more value=0Afrom unstructured infor= mation more quickly and at lower cost by making=0Ait easy to reuse and comb= ine analytic modules from different sources =0Ainto new analytic applicati= ons. The architecture and the framework =0Ahave been validated through wor= k with USA's DARPA which is using it =0Aas a standard for key projects wit= h several universities involved in =0Aadvanced linguistics analysis, such = as Carnegie Mellon, Columbia, =0AStanford and University of Massachusetts.= Other companies, such as =0Athe Mayo Clinic and Sloan Kettering, are als= o building efforts around =0AUIMA. In addition, over 15 software vendors,= including companies =0Asuch as Inxight, Attensity, Clear``Forest, Temis, = SPSS, SAS, Cognos, =0AEndeca, Factiva and others, announced plans to suppo= rt UIMA.=0A=0AThe UIMA framework (binary and/or source code) has been downl= oaded =0Aover 8000 times from IBM alphaWorks (http://www.alphaworks.ibm.co= m/ =0Atech/uima) or Source``Forge (http://uima-framework.sourceforge.net).= =0A=0A=3D=3D Rationale =3D=3D=0A=0AWe believe that moving the UIMA framewor= k development to the Apache =0Adevelopment community will lead to faster i= nnovation, better =0Aintegration with other open source software, and broa= der adoption of =0AUIMA, accelerating the industry's ability to get the mo= st value from =0Atext, audio, and video content. The UIMA framework is bec= oming =0Aattractive to developers who want to build components; we believe= =0Athat having UIMA on Apache will encourage the development of a basic = =0Aset of open source components that will jumpstart these developers' =0A= efforts. One of the first components we see possible synergy with is =0Aa = search component based on Apache Lucene that would enable semantic =0Asear= ch. We like the concept of the Lucene Sandbox as a way to =0Aencourage in= novation around UIMA, and would envision something =0Asimilar for this pro= ject.=0A=0A=0A=3D=3D Initial Goals =3D=3D=0A=0ASome initial work we see in = the incubator includes the following:=0A=0A* redoing the parts of the tooli= ng that were done as derivative works =0Aof Eclipse source code, to=0Aenab= le everything to be licensable under the Apache license=0A* extending the f= ramework to better support "scale-out"=0A* extending the framework to bette= r align with the emerging UIMA =0AStandards work=0A* extending the framewo= rk to support XMI-based SOAP and/or other =0Aservice interfaces=0A* extend= ing the framework to support OSGi-based approaches to =0Acomponentization = and packaging=0A* exploring embeddings of the framework within other intere= sted =0AApache projects, including synergies with Lucene=0A* providing aid= s to the community to migrate from previous versions =0Aof the framework t= o the Apache version=0A* setting up community support: hosting a facility s= imilar to the =0ALucene Sandbox to encourage innovation and=0Aexperimentat= ion; establishing a wiki and some process to allow better =0Adocumentation= to be developed by the community,=0Aand linking our existing XHTML documen= tation via an XSL transform to =0AApache FOP=0A=0A=0A=3D=3D Current Status= =3D=3D=0A=3D=3D=3D Meritocracy =3D=3D=3D=0A=0AMeritocracy seems to us an i= deal way to grow the community of =0Adevelopers around UIMA, it being a co= ntrolled, rational way to give =0Athose who positively contribute, more ab= ility to directly =0Acontribute. This approach also gives contributors on= e of the best =0Areasons to join the community of volunteers - to be recog= nized for =0Athe merit of their contributions.=0A=0A=3D=3D=3D Community = =3D=3D=3D=0ACurrently, the UIMA Framework development is being done by IBM,= with =0Ainput from a group of early adopters in industry and government. = =0AGoing forward, we see IBM continuing to support several committers = =0Aworking on it. We have already begun talking with other people =0Aouts= ide of IBM that have expressed interest in contributing towards =0Athe dev= elopment. This includes members of academic institutions, =0Apeople worki= ng for some of the software vendors that have announced =0Aplans to suppor= t UIMA, and others from companies that have expressed =0Ainterest since in= itial announcements about our open source plans. =0AMultiple non-IBM peop= le have already expressed desires to become =0Acommitters.=0A=0A=3D=3D=3D = Core Developers =3D=3D=3D=0AThe previous core developers of UIMA are Adam L= ally, Thilo Goetz, =0AMarshall Schor, Edward Epstein, Jaroslaw Cwiklik and= Thomas Hampp. =0AMany others have also contributed. The developers com= e from both the =0AResearch and Development parts of IBM.=0A=0A=3D=3D=3D A= lignment =3D=3D=3D=0AUIMA has significant synergy with search applications,= and we expect =0Ato see integration with Lucene in the future. UIMA makes= use of the =0AApache Portable Runtime (APR) for C++ support. It is desig= ned to be =0Aembeddable into other frameworks, such as web application ser= vers. =0APart of UIMA is Eclipse-based tooling. We use ANT for build = =0Ascripting. UIMA has support for various language bindings including = =0AC++ and Java; we also have more limited bindings for Perl, Python, =0Aa= nd TCL. UIMA uses Web Services as part of its approach to wiring up =0Aco= mponents in its domain. It makes use of XML services such as =0AXerces an= d Xalan.=0A=0AThe development of UIMA has been based on merit with open dis= cussion =0Aamong a distributed team of developers, from both Research and = =0ADevelopment organizations.=0A=0A=3D=3D=3D License =3D=3D=3D=0A=0AThe cu= rrent license for the source code is CPL, with a small number =0Aof files = licensed under the EPL (Eclipse Public License), because =0Athese were cre= ated as "derivative works" of existing Eclipse open =0Asource code. When = the code base is moved to Apache, it will be =0Arelicensed under the Apach= e license, except for the small number of =0Afiles licensed under the EPL = as derivative works of Eclipse source =0Afiles. We plan to work in the in= cubator to redo these parts, so the =0Aentire offering can be licensed und= er the Apache license.=0A=0AThe distribution for the C++ enablement layer i= ncludes open source =0Acomponents ICU (a Unicode package) which has its ow= n license. We =0Aplan to work with community to properly make use of this= non-Apache =0Alicensed component. Our current vision for the future of UI= MA has it =0Aaligning with and incorporating other standards-based open so= urce =0Acomponents/protocols, some of which may have licensing other than = the =0AApache license (for example, the Xml Metadata Interchange (XMI), an= d =0Athe EMF ECore Model from Eclipse); we will work with the community in= =0Afiguring out how to move forward on this.=0A=0A=3D=3D=3D Other IP =3D= =3D=3D=0A=0AWhen we requested OASIS to set up a Technical Committee charter= ed to =0Adevelop a platform-independent specification for text and multi-m= odal =0Aanalysis, we specified that it be set up under the "RF on Limited = =0ATerms" mode of the OASIS IP Policy. "RF" means Royalty Free, and the = =0ALimited Terms means companies that are working with us on the =0ATechni= cal Committee are restricted in adding additional terms.=0A=0AThese are the= most liberal terms and make any Essential Claims =0Aavailable to ALL and = ROYALTY FREE.=0AFor the details please refer to:=0A=0A* http://www.oasis-op= en.org/who/ipr/ipr_faq.php=0A* http://www.oasis-open.org/who/intellectualpr= operty.php=0A=0AUltimately of course, there is always a risk that someone i= n the =0Aworld holds a patent that can be claimed as Essential. The most a= ny =0Astandards organization can do is govern the behavior of those who = =0Aparticipate in its work and publicly document the licensing =0Acommitme= nt of all participants.=0A=0A=3D=3D Known Risks =3D=3D=0A=0A=3D=3D=3D Orpha= ned Software =3D=3D=3D=0A=0AUIMA has been in active development for 5 years= . The community of =0Ausers has steadily grown, and there are now signifi= cant commercial =0Aand research organizations actively using it. UIMA is = embedded in =0AIBM software products and is delivered through IBM services= =0Aengagements. IBM has developers assigned to it, and is continuing to = =0Asupport its development. In addition, several people outside of IBM = =0Ahave already expressed interest in working on UIMA, and have been =0Apr= oviding IBM with initial feedback. One of the objectives of =0Astarting th= is Apache project is to provide a meritocratic structure =0Afor those peop= le to begin more actively contributing to UIMA.=0A=0A=3D=3D=3D Inexperience= with Open Source =3D=3D=3D=0A=0AThe individuals working on this software h= ave background as IBM =0Asoftware developers. While many of them have exp= erience working with =0Aopen source software, none of them has had extensi= ve experience =0Acontributing to other open source software. However, IBM= as an =0Aorganization has extensive experience contributing to open sourc= e =0Aprojects and will make available resources to provide guidance to the= =0Adevelopers working on this project.=0A=0A=3D=3D=3D Homogenous Develope= rs (work for same company?) =3D=3D=3D=0A=0ACurrently all the developers wor= k for IBM, although they come from =0Adifferent geographically dispersed o= rganizations within IBM. We will =0Areach out during the incubation time = to get others to contribute; we =0Ahave already received interest from sev= eral parties.=0A=0A=3D=3D=3D Reliance on salaried developers =3D=3D=3D=0A= =0ACurrently the developers are paid employees of IBM.=0A=0A=3D=3D=3D Relat= ionships with Other Apache Products =3D=3D=3D=0A=0AWe make use of several A= pache components (SOAP / Web Services, XML =0A(Xerces, Xalan), languages (= Perl), scripting languages (ANT), Apache =0APortable Runtime. In addition= , UIMA has been embedded in other =0Aframeworks, such as web application s= ervers, and integrated with =0Asearch engines. We are exploring Lucene ex= tensions that could take =0Aadvantage of UIMA processed data. We are curr= ently investigating and =0Aprototyping some software packaging concepts ba= sed on OSGi; the =0AApache Incubator project Felix may have relevance as w= e go forward. =0AThe documentation is being moved to XHTML and plans to u= se Apache FOP =0Afor producing PDF reference materials.=0A=0A=3D=3D=3D An = Excessive Fascination with the Apache Brand =3D=3D=3D=0AUIMA is already bei= ng adopted by a wide cross section of users, both =0Acommercial and academ= ic, world-wide. Our experience shows that =0Aanalytic modules can be reuse= d and combined through UIMA making it =0Aeasier and faster for developers = to build new analytic applications =0Afor specific industries or domains. = Given the diversity of content =0Aand analytics that will be required to a= ddress the multitude of =0Aopportunities - from military intelligence to q= uality assurance to =0Acontact center analytics -- growing this infrastruc= ture so that it =0Abetter aligns with other major Open Source communities = should help =0Aaccelerate industry's ability to get value from content ass= ets.=0A=0AWe believe that the Apache community of developers has the =0Aex= perience, background, visibility, and synergistic resources to =0Aencourag= e and foster a vibrant developer community around this project.=0A=0A=0A=3D= =3D Documentation =3D=3D=0A=0AThere is a combination Introduction, Conceptu= al Overview, Tutorial, =0ATools and Framework User's Guides and References= , downloadable from =0Ahttp://dl.alphaworks.ibm.com/technologies/uima/ =0A= UIMA_SDK_Users_Guide_Reference_2.0.pdf=0A=0A=0A=3D=3D Scope of the project = =3D=3D=0A=0AThe project will develop implementations of the UIMA architectu= re =0A(which is concurrently being submitted to the OASIS standards =0Apr= ocess), supporting the breadth of platforms that developers working =0Ain = this field are using, including Java, C++, Perl, Python and TCL; =0Aand ut= ilities and tooling to support component and application =0Adevelopers and= assemblers / packagers. It will initially include the =0AJava UIMA frame= work for UIMA Version 2 (you can see a snap shot of =0Athe Version 2 relea= se Source``Forge; the delivered code would this =0Acode base plus normal i= ncremental bug fixes and improvements), plus =0Aadditional components (mai= nly documentation and test cases, which are =0Anot currently on Source``Fo= rge). Over time, the project is expected =0Agrow to include supporting va= rious embeddings and integrations with =0Aother Apache components such as = search engines and web application =0Aframeworks.=0A=0AOver time, we envis= ion the project becoming an umbrella for related =0Aopen-source around UIM= A, including things like open-source pre- =0Aannotated corpora, and hosting= a facility similar to the Lucene =0ASandbox to encourage innovation and e= xperimentation.=0A=0AThe UIMA framework is primarily a set of libraries (in= Java, C++, =0APerl, etc.), test cases, and UIMA utilities and tools (scri= pts, =0Aplugins, executables, etc.) used to build, test and debug UIMA = =0Aanalytic components. The tooling includes several Eclipse platform =0A= plugins.=0A=0A=3D=3D Initial source =3D=3D=0A=0AThe source currently is mai= ntained in IBM internal software control =0Asystems, with a copy of each r= elease placed on SourceForge. At the =0Atime of launch, we plan to contri= bute the latest version of the code =0Abase (with some renaming of package= prefixes to reflect apache.org), =0Atest cases, build files, and document= ation, under the terms specified =0Ain the ASF Corporate Contributor Licen= se. We plan to donate the =0Aexisting C++ enablement layer and the suppor= t for Perl, Python, and =0ATCL a few months later than the initial donatio= n; this delay is to =0Agive us time to finish preparing that code base for= Open Source.=0A=0A=3D=3D ASF resources to be created =3D=3D=0A=0AMailing l= ists:=0A* uima-dev=0A* uima-commits=0A* uima-user (we already have a substa= ntial user community and expect =0Athem to turn up at Apache=0Asoon after = we've hopefully been accepted into the incubator)=0A=0AFor other resources = such as Subversion repository, JIRA etc. we hope =0Afor guidance from our = mentors.=0A=0A=3D=3D Initial Set of Committers =3D=3D=0A=0A* Michael Baessl= er (mba@michael-baessler.de)=0A* Edward Epstein (eddie@aewatercolors.com)= =0A* Thilo Goetz (twgoetz@gmx.de)=0A* Adam Lally (alally@alum.rpi.edu)=0A= * Marshall Schor (msa@schor.com)=0A=0A=3D=3D=3D Sponsor =3D=3D=3D=0A=0AWe a= re requesting the Incubator to sponsor this. Our current vision =0Ais tha= t it will become a top level project (other projects that =0Adevelop UIMA = components could become subprojects, for instance).=0A=0A=3D=3D=3D Mentors = =3D=3D=3D=0A=0A* Sam Ruby (ruby@apache.org)=0A* Ken Coar (coar@apache.org)= =0A* Ian Holsman (lists@holsman.net)=0A=0A=3D=3D=3D Section 6: Open Issues = for Discussion =3D=3D=3D=0A=0A=0A--=0AIan Holsman=0AIan@Holsman.net=0Ahttp:= //garden-gossip.com/ -- what's in your garden?=0A=0A=0A=0A=0A --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org