lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Perham (JIRA)" <>
Subject [jira] Commented: (SOLR-1536) Support for TokenFilters that may modify input documents
Date Thu, 11 Feb 2010 15:36:28 GMT


Mike Perham commented on SOLR-1536:

Another developer just mentioned that I might be able to use TFVs to implement the profanity
detector.  We've got termVectors="true" on the content field since we are also using MoreLikeThis.
 If I can get access to the field's TFV in the URP, I can just run through the profanities,
checking for each one in the TFV...  I'm not sure if this is possible - need to check the

> Support for TokenFilters that may modify input documents
> --------------------------------------------------------
>                 Key: SOLR-1536
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>    Affects Versions: 1.5
>            Reporter: Andrzej Bialecki 
>         Attachments: altering.patch
> In some scenarios it's useful to be able to create or modify fields in the input document
based on analysis of other fields of this document. This need arises e.g. when indexing multilingual
documents, or when doing NLP processing such as NER. However, currently this is not possible
to do.
> This issue provides an implementation of this functionality that consists of the following
> * DocumentAlteringFilterFactory - abstract superclass that indicates that TokenFilter-s
created from this factory may modify fields in a SolrInputDocument.
> * TypeAsFieldFilterFactory - example implementation that illustrates this concept, with
a JUnit test.
> * DocumentBuilder modifications to support this functionality.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message