lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <j...@apache.org>
Subject [jira] Commented: (SOLR-1536) Support for TokenFilters that may modify input documents
Date Thu, 11 Feb 2010 15:26:28 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832518#action_12832518
] 

Jan Høydahl commented on SOLR-1536:
-----------------------------------

In my head document-level modifications belong in UpdateRequestProcessors. You always have
SOLR-1725 to script those quickly, and configuring a chain is easily done in XML (http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section).

Trouble is, when you need to act on an analyzed version of a field, say, to match terms against
a normalized dictionary. To allow this, could we allow Analysis to run anywhere in the update
chain? That way we can put UpdateRequestProcessors after analysis as well:

{code:xml}
<updateRequestProcessorChain name="test">
    <processor class="org.apache.solr.update.processor.MyPreProcessorFactory" />
    <analysis />
    <processor class="org.apache.solr.update.processor.MyPostProcessorFactory" />
</updateRequestProcessorChain>
{code}

Making <analysis/> optional, the default would be at end as today. I have no idea of
how easy such a change would be with the current architecture.

> Support for TokenFilters that may modify input documents
> --------------------------------------------------------
>
>                 Key: SOLR-1536
>                 URL: https://issues.apache.org/jira/browse/SOLR-1536
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>    Affects Versions: 1.5
>            Reporter: Andrzej Bialecki 
>         Attachments: altering.patch
>
>
> In some scenarios it's useful to be able to create or modify fields in the input document
based on analysis of other fields of this document. This need arises e.g. when indexing multilingual
documents, or when doing NLP processing such as NER. However, currently this is not possible
to do.
> This issue provides an implementation of this functionality that consists of the following
parts:
> * DocumentAlteringFilterFactory - abstract superclass that indicates that TokenFilter-s
created from this factory may modify fields in a SolrInputDocument.
> * TypeAsFieldFilterFactory - example implementation that illustrates this concept, with
a JUnit test.
> * DocumentBuilder modifications to support this functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message