On 18 April 2011 13:30, Roland Villemoes <rv@alpha-solutions.dk> wrote:

Hi Gérard


Thank you for your reply.

You're welcome ;-)

I will absolutely look into it J


Still wondering where the Solr Community will bring this in the future?  

As for Lucene, SolR is mainly focussed on indexing and not preprocessing. I'm convince that it's good for an opensource project to stick on its core added-value. I'm seen to much projects dying because they tried to do everything. But perhaps this could be done in Nutch which cover almost every part of a search engine. UIMA (as it was suggested) may also be a good solution.

Looking at commercial products (we use this a lot here at Alpha Solutions) products like Exalead and FAST really does have impressive content (and search) pipelines, and most of all impressive tools included. And as the future for FAST is extremely uncertain now FAST customers moving to Solr will lack the pipelines and the tools.

But I don't suggest open source project to follow any of these commercial product roadmap ;-) I rather prefer modular self contained and efficient open source projects. Then for integration, we need another layer like UIMA or WebLab.

Well as consultants we can establish functionality developing the missing pieces – but tools are still missing. And where customers could (almost)  administer and work on pipelines themselves – they now need developers.

That's the tricky part. Hopefully these projects will mature to a level where administration of high level orchestration is easy. But to be frank, it's not really easy in many way and if the end-users want to administrate this part themselves, they will still need some basic understanding and training.

Thanks for input – looking forward to see more J

Good luck, keep me informed.



Roland Villemoes

From: Gérard Dupont [mailto:ger.dupont@gmail.com]
Sent: 18. april 2011 12:50
To: dev@lucene.apache.org
Subject: Re: PipeLine for Solr


Hi Roland,


We are proposing exactly this kind of integration facility with our open source WebLab-project (see weblab-project.org). The tutorials are not perfect, but we are a team of 15-like engineers on the project which has more than 4 years history and is currently used in our projects. Our goal is to rely as much as possible on standards and thus each processing step (SourceReader, Normaliser, Analyser...) are defined as Webservice. Then the global orchestration is done in BPEL. On the plus side we have a SolR indexer, but I'm quite sure it's not very optimised ;-).


If you are interested I'll be happy to support you (I'm paid for that already ;-).




On 18 April 2011 12:37, Roland Villemoes <rv@alpha-solutions.dk> wrote:

Hi All,


I know this question may have been asked before – but I really did not find any usable answers browsing the archives. So I have to try the developer list here.


We at Alpha Solutions often need a Pipeline for handling crawling, analyzing and routing before we hit the UpdateRequestHandler in Solr. I know we could actually use the UpdateRequestHandler for this - but often we like to perform all these tasks before hitting Solr.

We have been using OpenPipeline which does offer a GUI also which makes it rather nice to administer (if you tweak the GUI a bit!). I does seem though, that OpenPipeline will not really get going. Nothing happens, and there is not really any community around it – and it doesn’t seem that the guys that’s behind this will ever move this further.


So we are looking around towards other “pipeline” projects that can work well with Solr.


So – does any of you have any ideas on this? Any recommendations? Or any plans of this for Solr?


Thanks a lot

Med venlig hilsen / Best regards

Roland Villemoes
Tel: (+45) 22 69 59 62

Alpha Solutions A/S
Borgergade 2, 3.sal, DK-1300 Copenhagen K
Tel: (+45) 70 20 65 38

** This message including any attachments may contain confidential and/or privileged information
intended only for the person or entity to which it is addressed. If you are not the intended recipient
you should delete this message. Any printing, copying, distribution or other use of this message is strictly prohibited.
If you have received this message in error, please notify the sender immediately by telephone
or e-mail and delete all copies of this message and any attachments from your system.
Thank you.


Gérard Dupont
Information Processing Control and Cognition (IPCC)

CASSIDIAN - an EADS company

Document & Learning team - LITIS Laboratory


Gérard Dupont
Information Processing Control and Cognition (IPCC)
CASSIDIAN - an EADS company

Document & Learning team - LITIS Laboratory