incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "OpenNLPProposal" by JasonBaldridge
Date Wed, 03 Nov 2010 17:57:38 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "OpenNLPProposal" page has been changed by JasonBaldridge.
http://wiki.apache.org/incubator/OpenNLPProposal?action=diff&rev1=4&rev2=5

--------------------------------------------------

  
  == Background ==
  
- OpenNLP was started in 2000 by Jason Baldridge and Gann Bierner while they were graduate
students in the Division of Informatics at the University of Edinburgh. The initial codebase
for OpenNLP came out of the Grok natural language parsing toolkit which was used heavily in
both Baldridge's and Bierner's dissertations. The first paper that used Grok, and especially
the components that would become OpenNLP is [[http://comp.ling.utexas.edu/jbaldrid/papers/hockenmaier_etal_ESSLLI2000.pdf|Hockenmaier,
Bierner and Baldridge (2000)]] (later updated as the journal article [[http://comp.ling.utexas.edu/jbaldrid/papers/HockenmaierEtal2004.pdf|Hockenmaier,
Bierner, and Baldridge (2004)]]).
+ OpenNLP was started in 2000 by Jason Baldridge and Gann Bierner while they were graduate
students in the Division of Informatics at the University of Edinburgh. OpenNLP, broadly speaking,
was meant to be a high-level organizational unit for various open source software packages
for natural language processing; more practically, it provided a high-level package name for
various Java packages of the form opennlp.*. The first OpenNLP software package was the Grok
natural language parsing toolkit, which was also the genesis of what is now called the OpenNLP
Toolkit. The software released on the OpenNLP sourceforge site (started in 2000, along with
Grok) was simply a set of interfaces defined in the package opennlp.common and referred to
as the OpenNLP Java API. The actual implementations of natural language processing components
were provided in Grok, along with code for sentence parsing with Combinatory Categorial Grammar.
This code was used heavily in both Baldridge's and Bierner's dissertations. The first paper
that used Grok, and especially the components that would become the OpenNLP Toolkit is [[http://comp.ling.utexas.edu/jbaldrid/papers/hockenmaier_etal_ESSLLI2000.pdf|Hockenmaier,
Bierner and Baldridge (2000)]] (later updated as the journal article [[http://comp.ling.utexas.edu/jbaldrid/papers/HockenmaierEtal2004.pdf|Hockenmaier,
Bierner, and Baldridge (2004)]]).
  
- In 2000, Grok was split into two projects: OpenNLP tools for the core natural language processing
infrastructure and the Grok/OpenCCG library (openccg.sf.net) for parsing with categorial grammar.
Both projects have evolved independently since then and have mostly independent active developer
and user communities. OpenCCG is primarily used in the academic community, while OpenNLP has
considerable use in both academia and industry. As in indication of the academic impact of
OpenNLP, a search on Google scholar (done in March 2010) returned about 650 publications citing
the package. Some of these include the OpenNLP website and a few non-publications plus some
self-citations. Based on a scan of these results, we estimate that about 500 actual publications
have used OpenNLP in their work, and there are an addition 50 or so quasi-publications like
surveys and instruction manuals.
+ In 2003, it was decided to remove the NLP infrastructure from Grok as there was a clear
separation between the basic text processing components and the syntactic and semantic analysis
components. At the same time, Grok was rebranded as OpenCCG (openccg.sf.net). The final release
of the OpenNLP Java API was made in March 2003; the new OpenNLP Toolkit was created from the
API and the Grok text processing components, with version 1.0 being released in April 2004.
The OpenNLP Toolkit and OpenCCG have evolved independently since then and have mostly independent
and active developer and user communities. OpenCCG is primarily used in the academic community,
while OpenNLP has considerable use in both academia and industry. As in indication of the
academic impact of OpenNLP, a search on Google scholar (done in March 2010) returned about
650 publications citing the package. Some of these include the OpenNLP website and a few non-publications
plus some self-citations. Based on a scan of these results, we estimate that about 500 actual
publications have used OpenNLP in their work, and there are an addition 50 or so quasi-publications
like surveys and instruction manuals.
  
- The activity level of the OpenNLP project has risen and fallen over that past 10+ years,
with a large uptick in the last two years especially. Most recently, due both to the availability
of new documentation and the release of version 1.5 , there have been many more downloads
and page views for the OpenNLP project. In fact, September 2010 had the most downloads (1,561)
and project web hits (226,391) of any month since the project’s beginning in 2000, and October
is keeping pacing with that figure so far. As a result, OpenNLP has gone from being in the
2000th to 4000th ranked project (between January and May, 2010) to being ranked 570, 314,
181 and 439 for July, August, September, and October respectively. Full details are available
on the Sourceforge statistics page for OpenNLP.  (There are 240,000 projects hosted on SourceForge,
though this figure includes many, many projects that never actually get started: it seems
that about 7-10% of these are stable, active projects based on a review done in 2007.) 
+ The activity level of the OpenNLP project has fluctuated over that past 10+ years, with
a large uptick in the last two years especially. Most recently, due both to the availability
of new documentation and the release of version 1.5 , there have been many more downloads
and page views for the OpenNLP project. In fact, September 2010 had the most downloads (1,561)
and project web hits (226,391) of any month since the project’s beginning in 2000, and October
is keeping pacing with that figure so far. As a result, OpenNLP has gone from being in the
2000th to 4000th ranked project (between January and May, 2010) to being ranked 570, 314,
181 and 439 for July, August, September, and October respectively. Full details are available
on the Sourceforge statistics page for OpenNLP.  (There are 240,000 projects hosted on SourceForge,
though this figure includes many, many projects that never actually get started: it seems
that about 7-10% of these are stable, active projects based on a review done in 2007.) 
  
  == Rationale ==
  

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message