community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Univ. Field Experience
Date Fri, 18 Mar 2011 02:32:38 GMT
Here's what I submitted:
Site Name:  (it's the only address I could find!)
Apache Software Foundation
Dept. 9660, Los Angeles,
CA 90084-9660, U.S.A.
Grant Ingersoll
VP, Apache Lucene
(For Lucene/Solr, Mahout, OpenNLP and Open Relevance projects)

Those interested in other ASF related projects can subscribe to the
mailing list (send an email to and follow the instructions.)
Please list some characteristics of your site:
The Apache Software Foundation (ASF) is one of the preeminent open source software providers
in the world.  The ASF has over 80 different projects and over 2000 volunteer committers,
producing a wide range of software from the HTTPD Server that powers much of the web to the
likes of Hadoop, Lucene, Solr, Mahout and Tomcat.  We are almost totally a volunteer driven
organization where people can contribute as they see fit to help a project.  We do almost
all of our collaboration online via email, IRC, etc.  In many ways, I suspect we are unlike
most any other organization that has submitted here, as we don't have bosses and we all volunteer
to contribute.

As for the projects I'm interested in, Lucene is the preeminent open source search library
on the planet today.  It is used in a large number of applications and services ranging from
mobile devices to sites powering 1 billion plus searches a day.  Solr is a platform on top
of Lucene that makes it easy for people to use Lucene's power without as much programming.
 Mahout is a relatively new project focused on scalable machine learning algorithms for clustering,
classification and recommendations, amongst other topics.  OpenNLP is a library focused on
natural language processing tasks like part of speech tagging, named entity recognition and
What tasks can your intern expect to perform?:
There are a couple things that I am specifically looking for, but there are also broader opportunities
for anyone to contribute to any project at the ASF.  I am happy to direct people on where
to go for the latter and can likely point them at potential mentors, but really am here to
focus on the former, as that is what I intend to mentor on.

Specifically, I am looking for a couple of different things:
1. One or more people to help define and build out a set of corpora (publicly available, with
no intellectual property encumbrances), relevance judgments, queries, etc. for testing search
engines and machine learning algorithms such as Lucene, Mahout, OpenNLP and possible others
via the Open Relevance Project (ORP --  If you are
familiar with the Text Retrieval Evaluation Conference (TREC), you can think of ORP as an
open source TREC set of evaluations.  Collections can range from traditional texts (email,
articles, web crawls, etc.) to ecommerce to spatial (local search -- such as open street map).
 I'm looking for someone who has the vision to put forth ideas and bring them to fruition.
 You don't have to be able to code, but it would be helpful.  

2.  One or more people to build out an open relevance evaluation web tool for capturing relevance
experiments, evaluating them using common measures such as precision/recall, mean reciprocal
rank, normalized discounted gain, etc.  Again, the successful candidate will have the opportunity
to put forth a vision for what such a tool should be and then work to make it happen.  This
opportunity requires programming skills, preferably in Java, but other languages can be considered.

3. Lucene, Solr, Mahout and OpenNLP are always looking for contributions in terms of code,
documentation, evaluation, etc.   See the respective project websites (, and for more information on
the projects and then feel free to propose ideas.

4. I'm sure other ASF projects would be willing to entertain other ideas.

All work will be done in an open source fashion.  All technical ideas/questions/discussions
will take place on public mailing lists.  Personal issues will be handled by me as the site
supervisor.  Thus, the intern will not only learn valuable real world skills that may be useful
to large audiences of people but they will also gain intimate knowledge of how open source
projects are built.


On Mar 17, 2011, at 9:33 AM, Ross Gardler wrote:

> On 17/03/2011 13:20, Grant Ingersoll wrote:
>> On Mar 16, 2011, at 8:25 PM, Ross Gardler wrote:
>>>> Do I need Board approval?  I think as an Officer of the ASF I can
>>>> do some of this, but want to make sure I'm proceeding correctly.
>>> No need to bother the Board. It's in our charter to provide
>>> whatever support you need. It's nice to have a real test case to
>>> flesh out the ideas here at the ASF.
>> Specifically, I need to fill out the form at
>> I would like to fill out the Site Name and Address as the ASF and my
>> name as the site supervisor.  Does that seem OK?
> I don't see any reference to terms and conditions or other such legal stuff. Assuming
that this form does not bind us in any way to anything we don't already offer through ComDev
then that's fine. As far as I'm concerned you can go ahead with this on a lazy consensus basis.
> Who do you plan to put down under:
> "Contact information for site supervisor (include name, title, e-mail, and phone number):
> In many ways I think that should be this list. We have a list of projects that are suitable
for students and I'd like to make these available to SILS. Putting ComDev down means you don't
have to worry about approaches from people in other areas.
> On the other hand, the definition of "site supervisor" [1] is that of a mentor in our
language. The responsibilities are very similar to those of our mentors (in fact I think I'll
steal the couple that are missing in our definition of a mentor)
> Perhaps the best way forward is for you to do it this year as "site supervisor" and,
if you feel it is appropriate, link them to ComDev to explore a more complete offering from
the ASF as a whole.
> Ross
> [1]

View raw message