incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "Any23Proposal" by DavidePalmisano
Date Tue, 19 Jul 2011 16:55:55 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "Any23Proposal" page has been changed by DavidePalmisano:
http://wiki.apache.org/incubator/Any23Proposal?action=diff&rev1=29&rev2=30

  
  == Abstract ==
  The following proposal is about ''Anything To Triples'' (shortly Any23) defined as a Java
library, 
- a Web service and a set of command line tools for extracting and validating structured data

+ a Web service and a set of command line tools to extract and validate structured data 
  in [[http://www.w3.org/RDF/|RDF]] format from a variety of Web documents and markup formats.

  Any23 is what it is informally named an ''RDF Distiller''.
  
  == Proposal ==
  Any23 "Anything to Triples" is a library written in Java 6 and released under the Apache
2.0 License.
- It provides a set of extractors for scraping semantic markup (such as Microformats, RDFa
and Microdata) 
+ It provides a set of extractors for scraping semantic markup (such as [[http://microformats.org/|Microformats]],
[[http://www.w3.org/TR/rdfa-syntax/|RDFa]] and [[http://www.w3.org/TR/microdata/|Microdata]])

- from several sources (HTML4, XHTML5, CSV), a set of data validations, a set of parsers and
writers for parsing the
+ from several sources (HTML4, XHTML5, CSV), a set of data validations, a set of parsers and
writers for handling the
  main RDF transport formats (RDFXML, Ntriples, NQuads, Turtle). 
  The library provides a command line tool for dealing with data extraction, conversion and
validation,
  and a REST service implementation. The library is plugin based, allowing the hot loading
of new extractors and validators.
  Any23 enables third-parties developers to access structured data from Web pages without
the need of implementing ad-hoc scraping
- techniques.
+ techniques. In this sense, Any23 will relieve developers from build complex solutions when
developing data acquisition
+ pipelines and processes targeted to semantically marked-up Web data. 
  
  == Background ==
  Any23 has been initially developed at [[http://www.deri.ie/|DERI (Digital Enterprise Research
Institute)]], 
@@ -46, +47 @@

  
  == Meritocracy ==
  The historical Any23 team believes in meritocracy and always acted as a community.
- Mailing list, opened issue tracker and other communication channels have always been
+ Mailing list, open issue tracker and other communication channels have always been
  adopted since its first release. The adoption in a larger community, such as Apache, 
  is the natural evolution for Any23. Moreover, the Apache standards will enforce the
  existing Any23 community practices and will be a foundation for future committers
@@ -77, +78 @@

  The Any23 service is targeted to run within any compliant Servlet 
  container like Tomcat.
  
- It is planned to use [[http://poi.apache.org/|Apache POI]].
+ It is planned to use [[http://poi.apache.org/|Apache POI]] to handle the extraction
+ of Microsoft documents metadata.
  
  = Known Risks =
  == Orphaned Products ==
  The increasing number of Any23 adopters and the raising interest for Semantic Web related
  technologies let use believe that there is a minimal risk for this work to being abandoned

  from the community. Moreover Any23 has been already used in production by Sindice.com and

- other DERI projects from years.
+ other DERI projects since years.
  
  == Inexperience with Open Source ==
  All of the committers have experience working in one or more open source projects inside
and outside ASF.
  
  == Homogeneous Developers ==
- The list of initial committers are geographically distributed across the Europe with no
one company being associated with a majority of the developers.  Many of these initial developers
are experienced Apache committers already and all are experienced with working in distributed
development communities.
+ The list of initial committers are geographically distributed across 
+ the Europe with no one company being associated with a majority of the developers. 
+ Many of these initial developers are experienced Apache committers already 
+ and all are experienced with working in distributed development communities.
  
  == Reliance on Salaried Developers ==
- To the best of our knowledge, the bigger part of the initial committers are being paid to
develop code for this project.
+ To the best of our knowledge, the biggest part of the initial committers is being paid to
develop code for this project due to
+ the adoption of Any23 in their organizations infrastructures.
+ In any case, some of the core historical developers (some of them no longer getting paid
from the original companies behind Any23) 
+ are still committing even if Any23 is not employed in their actual organizations. Any23
has already proven its capability to
+ attract external developers.
  
  == Relationships with Other Apache Products ==
  In the last years, other projects have been under ASF incubation process relying on the
Semantic Web technology stack, such as Apache Clerezza, Stanbol and Jena. This could be seen
as a proof of the consolidation and the adoption growing tendency of such technologies.
  Apart the specificity of those projects, sharing the same underlying stack, Any23 could
be employed in every projects needing a reliable
  framework to access structured semantic markup. Any23 core could be easily released also
as a 
- [[http://wiki.apache.org/nutch/PluginCentral|Apache Nutch Plugin]]. 
+ [[http://wiki.apache.org/nutch/PluginCentral|Apache Nutch Plugin]] and then, used to handy
fill [[http://www.openrdf.org/doc/sesame2/system/ch05.html|SAIL-compliant]] triple stores.

  
  == A Excessive Fascination with the Apache Brand ==
  
@@ -111, +120 @@

  opened issues about new feature requests. Furthermore, we are convinced
  that we can enthusiastically bring inside the ASF new and fresh energies
  in order to improve our visions, insights and knowledge about the other 
- project,and feel honored to have opportunity to join the Apache bandwagon.
+ projects and, most important, to have the possibility of enlarge our small 
+ community with talented and passionate developers. 
  
  = Documentation =
  Any23 Documentation

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message