incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Incubator Wiki] Update of "LuceneConnectorFrameworkProposal" by GrantIngersoll
Date Thu, 31 Dec 2009 13:12:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "LuceneConnectorFrameworkProposal" page has been changed by GrantIngersoll.


New page:
= Lucene Connector Framework =
== Abstract ==

Many, many search engines, as well as other applications, have a need to connect with content
repositories (SharePoint, CMS, Documentum, etc.) in a standard manner.  The Lucene Connector
Framework (LCF) is a project aimed at building out these connectors in open source under the
Apache brand.

== Proposal ==

The goal of LCF is to create a viable Lucene subproject aimed at delivering a best of breed
connector framework under the Apache Lucene name.  As a framework, the project will not only
provide a way to connect to individual repositories, but also a mechanism for plugging in
new connectors or custom connectors in a straightforward manner.

A connector framework is vital for search engines and other tools that need to access data
located in corporate repositories.  By abstracting the problem into a framework, applications
can code to a set of well-defined interfaces instead of having to use a different interface
for each connector.

Key features:
 * Connectors for many existing corporate repositories: Sharepoint, web crawling, RSS, Database,
FileNet, LiveLink, Documentum, etc.
 * Supports incremental connections
 * Provides awareness of security
 * User interface for configuring the connectors

== Background ==

MetaCarta originally approached Grant Ingersoll from the Lucene PMC about donating their existing
connector framework to the Lucene PMC.  After some discussion about accepting it as a software
grant, the PMC decided it would be best to incubate the project first.

== Rationale ==

The Connector Framework fills an often significant gap in the Lucene experience, namely, how
to get content locked away in a content repository into Lucene/Solr/Nutch/Mahout/Tika.  Naturally,
many other tools (search engines and others) will also have this same problem.  A Connector
Framework would also be useful for someone wishing to migrate between content repositories,

= Current Status =

== Meritocracy ==

Building the community using a meritocratic approach is very important to the success of LCF.
We know many, many people in the search space (and otherwise) have either written their own
connectors or are in need of connectors.  Thus, we expect a meritocratic community will lead
to widespread participation.

== Community ==

Our hope is that our existing code, features and capabilities will attract a large community
of both developers and users. We also believe that other organizations will find this project
interesting and relevant, and contribute resources.

The user community of LCF would be similar to that of the other Lucene projects, and in many
cases they would overlap. 

== Core Developers ==

See the initial committer list below.

== Alignment ==

We expect LCF will align quite well with the existing Lucene community and will also provide
significant value to other ASF and non-ASF projects as well as many companies and individuals
looking to access their content repositories in a programmatic fashion.

= Known Risks =

== Orphaned Products ==

The Connector Framework is an important piece of any search engine, including MetaCarta's,
as it provides the primary mechanism for getting content out of a repository and into the
search engine's index.  Thus, we don't expect it will be orphaned anytime soon.  Once the
project is established and the code is available, we expect to attract not only other search
companies, but others with similar needs.

== Inexperience with Open Source ==
Grant Ingersoll provides the majority of the experience with Open Source at the ASF, but all
of the initial committers are familiar with Open Source and have contributed to other open
source projects.

== Homogeneous Developers ==

The current list of committers are mostly members of either the MetaCarta or Lucid Imagination
developer team, but we are actively recruiting other developers. We plan on quickly recruiting
other committers from the Lucene community.

== Reliance on Salaried Developers ==

All of the committers are salaried employees of their respective companies.  We plan on recruiting
other committers/contributors as quickly as possible.

== Cryptography ==


== Legal Concerns ==

Some of the connectors in the existing framework require paid licenses to use.  We will need
to evaluate each connector to see what can be appropriately included.  For those connectors
that require a paid license, we will need to determine a plan for including the wrapper code
without the underlying bindings in a legal manner.  We expect we can provide the wrapper code
without the binding and that the code will thus only be compilable by someone who has access
to the binding.  (This is what Google has done for their individual connectors).  Longer term,
we expect to demonstrate to the companies with proprietary connectors why it is more valuable
for them to open up their specific connector pieces to give broader access to people looking
to leverage their content in the repository.

=== Trademark ===

The project is being rebranded from a MetaCarta internal name to the Lucene Connector Framework,
which will be an ASF mark.  

== Relationships with Other Apache Products ==

We expect almost all of the Apache Lucene ecosystem will benefit from having a standard way
of connecting to content repositories.  Additionally, users of UIMA should also benefit.

== A Excessive Fascination with the Apache Brand ==

All of us are familiar with the value that Apache brings to a project in building out a community.
 We also are all significant users of Apache Lucene and related tools (Solr, Nutch, Mahout,
Tika) and expect a close relationship with those projects will help significantly grow the
LCF community.

= Documentation =


= Initial Source =

All code is currently developed in-house at MetaCarta.

= Source and Intellectual Property Submission Plan =

TODO: Upload an tarball and a Software Grant

= External Dependencies =


= Required Resources =
 * Mailing lists
  * connectors-private (with moderated subscriptions)
  * connectors-user@
  * connectors-dev@
  * connectors-commit@
 * Subversion directory
 * Website
  * Confluence (CONNECTORS)
 * Issue Tracking

= Initial Committers =
Names of initial committers with affiliation and current ASF status:

 * Karl Wright (kwright at metacarta)
 * Josiah Strandberg (jstrandberg at metacarta)
 * Ken Baker (bakerkj at metacarta)
 * Marc Meadows (mam at metacarta)
 * Grant Ingersoll (gsingers@a.o Lucid Imagination)
 * Brian Pinkerton (brian.pinkerton at Lucid Imagination)

= Sponsors =
== Champion ==
 * Grant Ingersoll

== Nominated Mentors ==
 * Grant Ingersoll

== Sponsoring Entity ==
 . Apache Lucene PMC: Message ID: in private@lucene.a.o

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message