lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <>
Subject [jira] [Commented] (SOLR-4648) Create a PreAnalyzedUpdateProcessor
Date Wed, 03 Apr 2013 16:21:36 GMT


David Smiley commented on SOLR-4648:

Feedback on your patch:
* in Solr, it's bad form to call Class.forName() as you're doing in PreAnalzyedField; you're
supposed to use the SolrResourceLoader instead.  Instead use {{schema.getResourceLoader().findClass(implName,
* minor: you're using 4-spaces indent instead of 2 in the main value loop in your URP
* in those log.debug() calls, it's creating the string to potentially not even log it.  Instead
use this feature of SLF4J: {{log.debug("Ignoring stored part - field '{}' is not stored.",
sf.getName());}}  So few people use that great feature of SLF4J and instead rely on more bulky
surrounding conditionals.
* When you call sf.createFields(o, 1.0f) in the URP, isn't this doing analysis -- the very
analysis that you're trying to avoid by pre-analyzing?  It is.  And isn't 'o' the JSON or
similar representation of the analyzed token stream, and thus shouldn't be passed in here?
 When I run your test, I get a logged error because of this, actually.  Stepping through with
a debugger further validated this.  This must be remedied.  Looking at what FieldType.createField()
is doing, I propose you do the same in this URP:

org.apache.lucene.document.FieldType newType = new org.apache.lucene.document.FieldType();
    newType.setIndexOptions(getIndexOptions(field, val));
//getIndexOptions() is:
    IndexOptions options = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS;
    if (field.omitTermFreqAndPositions()) {
      options = IndexOptions.DOCS_ONLY;
    } else if (field.omitPositions()) {
      options = IndexOptions.DOCS_AND_FREQS;
    } else if (field.storeOffsetsWithPositions()) {
//then what createField() does is:
    Field f = new Field(name, val, type);
This won't work for all field types (e.g. numbers) but it will for TextField (the one with
analysis!) and could potentially for a custom FieldType.  I don't think we should put much
if any work into supporting other FieldTypes.

The resulting "Field" instance can be shared from one document to the next I believe, and
so you can cache this in the URP and reset its value & tokenStream.
> Create a PreAnalyzedUpdateProcessor
> -----------------------------------
>                 Key: SOLR-4648
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.3, 5.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 4.3, 5.0
>         Attachments: SOLR-4648.patch, SOLR-4648.patch, SOLR-4648.patch
> Spin-off from the discussion in SOLR-4619.
> Instead of using a PreAnalyzedField type we can use an UpdateRequestProcessor that converts
any input field values to StorableField-s, using the PreAnalyzedParser-s, and then directly
passes StorableField-s to DocumentBuilder for indexing.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message