lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (Commented) (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-2823) Re-use of UpdateProcessor configurations in multiple UpdateChains
Date Wed, 12 Oct 2011 23:09:12 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126235#comment-13126235
] 

Jan Høydahl commented on SOLR-2823:
-----------------------------------

Hey guys, you're jumping fast here :)

Erik, you must have peeked in my ideas book because exactly what you propose is something
I planned to introduce later, but using Groovy as the DSL :) - much like Gradle does. I think
this could be achieved by making UpdateProcessorChains pluggable and definable in solrconfig.
The DefaultUpdateProcessorChain could be the simple linear array[] of processors. The ScriptedUpdateProcessorChain
would be the powerhouse where you could do both simple linear ones as well as complex logic.
You can even do simple document manipulation inline without calling a processor, such as doc.deleteField("title")...

This approach also solves another wish of mine, namely being able to define chains outside
of solrconfig.xml. Logically, configuring schema and document processing is done by a "content"
guy, but configuring solrconfig.xml is done by the "hardware/operations" guys. Imagine a solr/conf/pipeline.groovy
defined in solrconfig.xml:

{code:xml}
<updateProcessorChain class="solr.ScriptedUpdateProcessorChainFactory" file="pipeline.groovy"
/>
{code}

pipeline.groovy:
{code}
chain simple {
  process(langid)
  process(copyfield)
  chain(logAndRun)
}

chain moreComplex {
  process(langid)
  if(doc.getFieldValue("employees") > 10)
    process(copyfield)
  else
    chain(myOtherProcesses)
  doc.deleteField("title")
  chain(logAndRun)
}

chain logAndRun {
  process(log)
  process(run)
}

processor langid {
  class = "solr.LanguageIdentifierUpdateProcessorFactory"
  config("langid.fl", "title,body")
  config("langid.langField", "language")
  config("map", true)
}

processor copyfield {
  script = "copyfield.groovy"
  config("from", "title")
  config("to", "title_en")
}
{code}

I don't know what it takes to code such a thing, but if we had it, I'd never go back to defining
pipelines in XML :)
                
> Re-use of UpdateProcessor configurations in multiple UpdateChains
> -----------------------------------------------------------------
>
>                 Key: SOLR-2823
>                 URL: https://issues.apache.org/jira/browse/SOLR-2823
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Jan Høydahl
>            Priority: Minor
>
> When dealing with multiple UpdateChains and Processors, you frequently need to re-use
configuration. Two chains may be equal except for one config setting in one <processor>.
> I propose to allow named processor configs, which can be referenced by name in the chains.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message