lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (Commented) (JIRA) <>
Subject [jira] [Commented] (SOLR-2823) Re-use of UpdateProcessor configurations in multiple UpdateChains
Date Wed, 12 Oct 2011 23:09:12 GMT


Jan Høydahl commented on SOLR-2823:

Hey guys, you're jumping fast here :)

Erik, you must have peeked in my ideas book because exactly what you propose is something
I planned to introduce later, but using Groovy as the DSL :) - much like Gradle does. I think
this could be achieved by making UpdateProcessorChains pluggable and definable in solrconfig.
The DefaultUpdateProcessorChain could be the simple linear array[] of processors. The ScriptedUpdateProcessorChain
would be the powerhouse where you could do both simple linear ones as well as complex logic.
You can even do simple document manipulation inline without calling a processor, such as doc.deleteField("title")...

This approach also solves another wish of mine, namely being able to define chains outside
of solrconfig.xml. Logically, configuring schema and document processing is done by a "content"
guy, but configuring solrconfig.xml is done by the "hardware/operations" guys. Imagine a solr/conf/pipeline.groovy
defined in solrconfig.xml:

<updateProcessorChain class="solr.ScriptedUpdateProcessorChainFactory" file="pipeline.groovy"

chain simple {

chain moreComplex {
  if(doc.getFieldValue("employees") > 10)

chain logAndRun {

processor langid {
  class = "solr.LanguageIdentifierUpdateProcessorFactory"
  config("langid.fl", "title,body")
  config("langid.langField", "language")
  config("map", true)

processor copyfield {
  script = "copyfield.groovy"
  config("from", "title")
  config("to", "title_en")

I don't know what it takes to code such a thing, but if we had it, I'd never go back to defining
pipelines in XML :)
> Re-use of UpdateProcessor configurations in multiple UpdateChains
> -----------------------------------------------------------------
>                 Key: SOLR-2823
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Jan Høydahl
>            Priority: Minor
> When dealing with multiple UpdateChains and Processors, you frequently need to re-use
configuration. Two chains may be equal except for one config setting in one <processor>.
> I propose to allow named processor configs, which can be referenced by name in the chains.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message