lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4648) Create a PreAnalyzedUpdateProcessor
Date Wed, 03 Apr 2013 16:21:36 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621024#comment-13621024
] 

David Smiley commented on SOLR-4648:
------------------------------------

Feedback on your patch:
* in Solr, it's bad form to call Class.forName() as you're doing in PreAnalzyedField; you're
supposed to use the SolrResourceLoader instead.  Instead use {{schema.getResourceLoader().findClass(implName,
PreAnalyzedParser.class);}}
* minor: you're using 4-spaces indent instead of 2 in the main value loop in your URP
* in those log.debug() calls, it's creating the string to potentially not even log it.  Instead
use this feature of SLF4J: {{log.debug("Ignoring stored part - field '{}' is not stored.",
sf.getName());}}  So few people use that great feature of SLF4J and instead rely on more bulky
surrounding conditionals.
* When you call sf.createFields(o, 1.0f) in the URP, isn't this doing analysis -- the very
analysis that you're trying to avoid by pre-analyzing?  It is.  And isn't 'o' the JSON or
similar representation of the analyzed token stream, and thus shouldn't be passed in here?
 When I run your test, I get a logged error because of this, actually.  Stepping through with
a debugger further validated this.  This must be remedied.  Looking at what FieldType.createField()
is doing, I propose you do the same in this URP:

{code}
org.apache.lucene.document.FieldType newType = new org.apache.lucene.document.FieldType();
    newType.setIndexed(field.indexed());
    newType.setTokenized(field.isTokenized());
    newType.setStored(field.stored());
    newType.setOmitNorms(field.omitNorms());
    newType.setIndexOptions(getIndexOptions(field, val));
    newType.setStoreTermVectors(field.storeTermVector());
    newType.setStoreTermVectorOffsets(field.storeTermOffsets());
    newType.setStoreTermVectorPositions(field.storeTermPositions());
//getIndexOptions() is:
    IndexOptions options = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS;
    if (field.omitTermFreqAndPositions()) {
      options = IndexOptions.DOCS_ONLY;
    } else if (field.omitPositions()) {
      options = IndexOptions.DOCS_AND_FREQS;
    } else if (field.storeOffsetsWithPositions()) {
      options = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    }
//then what createField() does is:
    Field f = new Field(name, val, type);
    f.setBoost(boost);
{code}
This won't work for all field types (e.g. numbers) but it will for TextField (the one with
analysis!) and could potentially for a custom FieldType.  I don't think we should put much
if any work into supporting other FieldTypes.

The resulting "Field" instance can be shared from one document to the next I believe, and
so you can cache this in the URP and reset its value & tokenStream.
                
> Create a PreAnalyzedUpdateProcessor
> -----------------------------------
>
>                 Key: SOLR-4648
>                 URL: https://issues.apache.org/jira/browse/SOLR-4648
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.3, 5.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 4.3, 5.0
>
>         Attachments: SOLR-4648.patch, SOLR-4648.patch, SOLR-4648.patch
>
>
> Spin-off from the discussion in SOLR-4619.
> Instead of using a PreAnalyzedField type we can use an UpdateRequestProcessor that converts
any input field values to StorableField-s, using the PreAnalyzedParser-s, and then directly
passes StorableField-s to DocumentBuilder for indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message