lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Smiley, David W." <dsmi...@mitre.org>
Subject Re: PipeLine for Solr
Date Mon, 18 Apr 2011 16:21:06 GMT
Great discussion here; although I do believe it belongs on the Solr user list because we're
not talking about development on Solr.  I'm very tempted to cross-post but I believe that's
discouraged so I won't.

> Still wondering where the Solr Community will bring this in the future?

I strongly believe that Solr should focus on what it does best (being a search engine) and
not on pipelines / data acquisition which is really a separate concern that is useful without
Solr -- other apps could use such pipelines.  This is a chief concern I have with the DIH.

By the way I've used Endeca (a commercial long-time faceted search vendor) which has its own
pipeline called "Forge".  I used it on a project in which the pipelines were extremely extensive
getting data from a dozen plus sources of varying flavors and manipulating the data in various
ways.   It addresses a key need, but the implementation is poor IMO. The interesting parts
of it pertained to how it supports joins from sub-pipelines (i.e. chain of steps). I've not
yet been in the same situation with Solr. I've gotten by with some basic stuff thrown together
(shell scripts w/ XSLT) or simple DIH uses.

I've been maintaining a list of software that could be used for a data pipeline for getting
data into Solr.  Here it is:
* Calabache (XProc)
* OpenPipe
* ManifoldCF
* ESBs (various options; includes Spring-Integration Framework)

I don't have UIMA on this list since I think it's too focused on extracting data from unstructured
text than on being a solid pipeline first & foremost.

Roland, if your assessment on OpenPipeline going nowhere is true, then that's disappointing
news.

It's not clear to me that a data pipeline needs to be different than what ESBs do.  Some pieces
are missing but 80% of what's needed is there.  When I next have a project getting data from
many places I'll be able to think through this more.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/





Mime
View raw message