lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4648) Create a PreAnalyzedUpdateProcessor
Date Thu, 04 Apr 2013 12:21:16 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622080#comment-13622080
] 

Andrzej Bialecki  commented on SOLR-4648:
-----------------------------------------

Thanks again for a thorough review. New patch is attached.

bq. in Solr, it's bad form to call Class.forName() ..

Fixed.

bq. minor: you're using 4-spaces indent instead of 2 in the main value loop in your URP

Fixed (it was actually a left-over from removing an outer-level loop, all other indents are
2 spaces).

bq. in those log.debug() calls, it's creating the string to potentially not even log it

Fixed.

bq.  Looking at what FieldType.createField() is doing, I propose you do the same in this URP
...

Funny thing, I had this in one version of the patch, and then decided to reuse SF.createField(..)
to avoid code duplication. The problem is that SchemaField.isTokenized() has package-level
visibility so it's not visible in the UP's package. I fixed this by providing a utility method
in PreAnalyzedField to create a FieldType. Also, I moved there the chunk of logic for setting
/ resetting the Field content and type flags based on SchemaField. Overall, it simplifies
the UP.

bq. The resulting "Field" instance can be shared from one document to the next I believe,
and so you can cache this in the URP and reset its value & tokenStream.

Hmm, this doesn't seem feasible at all. First, this cache would have to be thread safe, and
prevent reuse of Field instances until the document is actually processed by IndexWriter -
I don't think there's a mechanism to enforce this in the context of this class? Also it would
have to cache multiple instances of Field, because processing a single document may result
in creating multiple instances (at least one per pre-analyzed field, more if fields are multi-valued).
                
> Create a PreAnalyzedUpdateProcessor
> -----------------------------------
>
>                 Key: SOLR-4648
>                 URL: https://issues.apache.org/jira/browse/SOLR-4648
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.3, 5.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 4.3, 5.0
>
>         Attachments: SOLR-4648.patch, SOLR-4648.patch, SOLR-4648.patch, SOLR-4648.patch
>
>
> Spin-off from the discussion in SOLR-4619.
> Instead of using a PreAnalyzedField type we can use an UpdateRequestProcessor that converts
any input field values to StorableField-s, using the PreAnalyzedParser-s, and then directly
passes StorableField-s to DocumentBuilder for indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message