incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: [VOTE] Add Any23 to the Apache Incubator
Date Tue, 27 Sep 2011 07:04:43 GMT
+1 (binding)

Tommaso

2011/9/27 Mattmann, Chris A (388J) <chris.a.mattmann@jpl.nasa.gov>

> Hi Folks,
>
> OK, the proposal period had died now and I'm now calling a formal VOTE on
> the Any23 proposal located here:
>
> http://wiki.apache.org/incubator/Any23Proposal
>
> Proposal text copied at the bottom of this email. I'll leave the VOTE open
> through the
> rest of the week, and close it around Saturday, October 1, early AM PDT.
>
> Please VOTE:
>
> [ ] +1 Accept Any23 into the Apache Incubator
> [ ] +0 Don't care
> [ ] -1  Don't Accept Any23 into the Apache Incubator because...
>
> Thanks!
>
> Cheers,
> Chris
>
> P.S. Here's my +1
>
> Proposal Text:
>
> = Any23 =
> == Abstract ==
> The following proposal is about ''Anything To Triples'' (shortly Any23)
> defined as a Java library,  a Web service and a set of command line tools to
> extract and validate structured data  in [[http://www.w3.org/RDF/|RDF]]
> format from a variety of Web documents and markup formats.  Any23 is what it
> is informally named an ''RDF Distiller''.
>
> == Proposal ==
> Any23 "Anything to Triples" is a library written in Java 6 and released
> under the Apache 2.0 License. It provides a set of extractors for scraping
> semantic markup (such as [[http://microformats.org/|Microformats]], [[
> http://www.w3.org/TR/rdfa-syntax/|RDFa]] and [[
> http://www.w3.org/TR/microdata/|Microdata]])  from several sources (HTML4,
> XHTML5, CSV), a set of data validations, a set of parsers and writers to
> handle the main RDF transport formats (RDFXML, Ntriples, NQuads, Turtle).
>  The library provides a command line tool for dealing with data extraction,
> conversion and validation, and a REST service implementation. The library is
> plugin based, allowing the hot loading of new extractors and validators.
> Any23 enables third-parties developers to access structured data from Web
> pages without the need of implementing ad-hoc scraping techniques. In this
> sense, Any23 will relieve developers from build complex solutions when
> developing data acquisition pipelines and processes targeted to semantically
> marked-up Web data.
>
> == Background ==
> Any23 has been initially developed at [[http://www.deri.ie/|DERI (Digital
> Enterprise Research Institute)]],  as main component of the RDF extraction
> pipeline used in [[http://sindice.com/|Sindice (the Semantic Web Index)]],
> now is evolved in joint effort with [[http://www.fbk.eu/|FBK (Fondazione
> Bruno Kessler)]]. At present time the Any23 official [[
> http://developers.any23.org|developers page]] contains all the
> documentation, while the code is maintained on [[
> http://code.google.com/p/any23/|Google Code]]. An official up-to-date
> showcase [[http://any23.org|demo]] is also available.
>
> == Rationale ==
> Provide and maintain a robust, standard and updated library for extracting
> and validating semantic markup from heterogeneous sources would provide
> large benefits to the entire Open Source Community. Researchers and academic
> projects are adopting RDF related technologies from years  while the
> industry is actually moving toward Semantic Web technologies with more
> concreteness. Several industry initiatives related to the [[
> http://en.wikipedia.org/wiki/Semantic_Web|Web of Data]]  are taking place
> in the these months. [[http://schema.org|Schema.org]], for example, is an
> initiative sponsored by  [[
> http://www.google.com/about/corporate/company/|Google Inc]], [[
> http://info.yahoo.com/center/us/yahoo/|Yahoo Inc]]  and [[
> http://www.microsoft.com/about/companyinformation/en/us/default.aspx|MicrosoftCorporation]]
 to structure the data in a harmonized way on [[
> http://dev.w3.org/html5/spec/Overview.html|HTML5]] pages. [[
> http://schema.org|Schema.org]] leverages on the [[
> http://dev.w3.org/html5/md/|HTML5 Microdata]] native specification. [[
> http://ogp.me/|OpenGraphProtocol]] is the open standard sponsored by  [[
> https://www.facebook.com/pages/Facebooking/114721225206500|Facebook Inc]]
> to include metadata in HTML page headers.  [[
> http://ogp.me/|OpenGraphProtocol]], initially based on [[
> http://www.w3.org/TR/xhtml-rdfa-primer/|RDFa]], allows to describe the
> content of a Web page and its underlying vocabulary could be directly
> represented using RDF.
>
> = Current Status =
> == Meritocracy ==
> The historical Any23 team believes in meritocracy and always acted as a
> community. Mailing list, open issue tracker and other communication channels
> have always been adopted since its first release. The adoption in a larger
> community, such as Apache,  is the natural evolution for Any23. Moreover,
> the Apache standards will enforce the existing Any23 community practices and
> will be a foundation for future committers involvement.
>
> == Core Developers ==
> In alphabetical order:
>
>  * Davide Palmisano <dpalmisano at gmail dot com>
>  * Giovanni Tummarello <giovanni dot tummarello at deri dot org>
>  * Michele Mostarda <michele dot mostarda at gmail dot com>
>  * Richard Cyganiak <richard at cyganiak dot de>
>  * Reto Bachmann-Gmuer <reto at apache dot org>
>  * Simone Tripodi <simonetripodi at apache dot org>
>  * Szymon Danielczyk <danielczyk.szymon at gmail dot com>
>  * Tommaso Teofili <tommaso at apache dot org>
>
> == Alignment ==
> Main aim of the project is to develop and maintain a fully flavored
> semantic  markup distiller that can be used by other Apache projects that
> need an RDF extraction tool. The Any23 library core is written using the
> following Apache libraries.
>
>  * [[http://commons.apache.org/lang/|Apache Commons Lang]]
>  * [[http://hc.apache.org/httpclient-3.x/|Apache Commons HTTP Client]]
>  * [[http://commons.apache.org/codec/|Apache Commons Codec]]
>  * [[http://tika.apache.org/|Apache Tika]]
>  * [[http://commons.apache.org/cli/|Apache Commons CLI]]
>  * [[http://poi.apache.org/|Apache POI]]
>
> The Any23 service is targeted to run within any compliant Servlet
>  container like Tomcat.
>
> = Known Risks =
> == Orphaned Products ==
> The increasing number of Any23 adopters and the raising interest for
> Semantic Web related technologies let us believe that there is a minimal
> risk for this work to being abandoned  from the community. Moreover Any23
> has already been used in production by Sindice.com and  other DERI projects
> for years.
>
> == Inexperience with Open Source ==
> All of the committers have experience working in one or more open source
> projects inside and outside ASF.
>
> == Homogeneous Developers ==
> The list of initial committers are geographically distributed across Europe
> with no one company being associated with a majority of the developers.
>  Many of these initial developers are experienced Apache committers already
>  and all are experienced with working in distributed development
> communities.
>
> == Reliance on Salaried Developers ==
> To the best of our knowledge, the biggest part of the initial committers is
> being paid to develop code for this project due to the adoption of Any23 in
> their organizations infrastructures. In any case, some of the core
> historical developers (some of them no longer getting paid from the original
> companies behind Any23)  are still committing even if Any23 is not employed
> in their actual organizations. Any23 has already proven its capability to
> attract external developers.
>
> == Relationships with Other Apache Products ==
> In the last years, other projects have been under ASF incubation process
> relying on the Semantic Web technology stack, such as Apache Clerezza,
> Stanbol and Jena. This could be seen as a proof of the consolidation and the
> adoption growing tendency of such technologies. Apart the specificity of
> those projects, sharing the same underlying stack, Any23 could be employed
> in every projects needing a reliable framework to access structured semantic
> markup. Any23 core could be easily released also as a  [[
> http://wiki.apache.org/nutch/PluginCentral|Apache Nutch Plugin]] and then,
> used to handy fill [[
> http://www.openrdf.org/doc/sesame2/system/ch05.html|SAIL-compliant]]
> triple stores.
>
> == An Excessive Fascination with the Apache Brand ==
> Even if the Any23 community recognizes the power and the attractiveness  of
> the ASF brand, we are absolutely aware of our already established role in
> the wider Semantic Web developers community. Any23 already proved its
> reliability in closely support all the new specifications coming  from the
> Microformats communities, our major contributors in term of  opened issues
> about new feature requests. Furthermore, we are convinced that we can
> enthusiastically bring inside the ASF new and fresh energies in order to
> improve our visions, insights and knowledge about the other  projects and,
> most important, to have the possibility of enlarge our small  community with
> talented and passionate developers.
>
> = Documentation =
> Any23 Documentation
>
>  1. [[http://developers.any23.org/|Any23 Project Homepage]]
>  1. [[http://code.google.com/p/any23/|Any23 Developer Homepage]]
>  1. [[http://any23.org/|Any23 Live Demo]]
>
> Any23 Related Specifications
>
>  1. [[http://www.w3.org/RDF/|RDF]]
>  1. [[http://www.w3.org/TR/html5/|HTML5]]
>  1. [[http://www.w3.org/TR/rdfa-syntax/|RDFa]]
>  1. [[http://www.w3.org/TR/microdata/|Microdata]]
>  1. [[http://microformats.org/|Microformats]]
>  1. [[http://www.w3.org/TR/rdf-syntax-grammar/|RDF/XML]]
>  1. [[http://www.w3.org/TeamSubmission/turtle/|Turtle]]
>  1. [[http://www.w3.org/TR/rdf-testcases/#ntriples|N-Triples]]
>  1. [[http://sw.deri.org/2008/07/n-quads/|N-Quads]]
>
> Any23 Other documentation
>
>  1. [[
> http://www.slideshare.net/dpalmisano/distilling-the-web-of-data-drop-by-drop-with-java|Any23presentation
on Slideshare]]
>
> = Initial Source =
> The intial source comprises code developed on [[
> http://code.google.com/p/any23/|GoogleCode]] licensed under the Apache
> License 2.0 (to be contributed under Grant from Giovanni Tummarello for
> Any23).
>
> = Source and Intellectual Property Submission Plan =
> Source code will be moved from [[
> http://code.google.com/p/any23/|GoogleCode]] space inside the SVN space of
> the podling.
>
> = External Dependencies =
> All the external dependencies (and their licenses) used by Any23 follows:
>
>  * [[http://nekohtml.sourceforge.net/|Nekohtml]] (Apache 2.0)
>  * [[http://www.openrdf.org|OpenRDF Sesame]] (BSD-style license)
>  * [[http://jetty.codehaus.org/jetty/|Jetty]] (Apache License 2.0 and
> Eclipse Public License 1.0)
>  * [[http://code.google.com/p/jspf/|Java Simple Plugin Framework]] (new
> BSD License)
>  * [[http://code.google.com/p/boilerpipe/[|Boilerpipe]] (Apache License
> 2.0)
>  * [[http://www.slf4j.org/|slf4j]] (MIT License)
>  * [[http://www.junit.org/|junit]] (Common Public License - v 1.0)
>  * [[http://mockito.org/|Mockito]] (MIT License)
>
> = Cryptography =
> The project does not handle cryptography in any way.
>
> = Required Resources =
>  * Mailing lists
>  * any23-private (with moderated subscriptions)
>  * any23-dev
>  * any23-user
>  * any23-commits
>  * Subversion directory
>  * https://svn.apache.org/repos/asf/incubator/any23
>  * Website
>  * Confluence (ANY23)
>  * Issue Tracking
>  * JIRA (ANY23)
>
> = Initial Committers =
> Names of initial committers - in alphabetical order - with current ASF
> status:
>
>  * Chris Mattmann <mattmann at apache dot org> (Member)
>  * Davide Palmisano <dpalmisano at gmail dot com> (ICLA signed)
>  * Giovanni Tumarello <giovanni dot tummarello at deri dot org> (ICLA
> signed)
>  * Lewis John !McGibbney <lewismc at apache dot org> (PMC Member)
>  * Michele Mostarda <michele dot mostarda at gmail dot com> (ICLA signed)
>  * Paul Ramirez <pramirez at apache dot org> (Member)
>  * Reto Bachmann-Gmuer <reto at apache dot org> (Committer)
>  * Szymon Danielczyk <danielczyk.szymon at gmail dot com> (ICLA signed)
>
> = Sponsors =
> == Champion ==
>  * Chris Mattmann <mattmann at apache dot org> (Member)
>
> == Nominated Mentors ==
>  * Chris Mattmann <mattmann at apache dot org>
>  * Paul Ramirez <pramirez at apache dot org>
>  * Simone Tripodi <simonetripodi at apache dot org>
>  * Tommaso Teofili <tommaso at apache dot org>
>
> == Sponsoring Entity ==
>  * Tika PMC
>
> = Other interested people (in alphabetical order) =
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message