lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fergus McMenemie (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1120) Simplify EntityProcessor API
Date Fri, 17 Apr 2009 08:47:15 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700085#action_12700085
] 

Fergus McMenemie commented on SOLR-1120:
----------------------------------------

Good idea. Dont know if you are interested in the following.

* Further to extracting out the Transformer application logic, I was wondering if every entity
attribute read should automatically be processed by replaceTokens. Is there any ligitimate
place where one would want to disallow replaceTokens? The following snippet of code is repeated
far too many times; but is important if DIH is to provide simple predictable behaviour.  

{code}
    s = context.getEntityAttribute(CHANGELIST_OMIT);
    if (s != null) s = resolver.replaceTokens(s);

{code}

* The regexp transformer now has several combinations of mutually exclusive attributes. It
would be nice to check the attributes for nonsensical combinations. However given that the
transformer is invoked for every row such checking code could be a nasty overhead. I dont
know how to sort this, but somehow we need to catch the first invocation of a fields transformer
and allow far more detailed checking of the attributes

> Simplify EntityProcessor API
> ----------------------------
>
>                 Key: SOLR-1120
>                 URL: https://issues.apache.org/jira/browse/SOLR-1120
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>
> Writing an EntityProcessor is deceptively complex. There are so many gotchas.
> I propose the following:
> # Extract out the Transformer application logic from EntityProcessor and add it to DocBuilder.
Then EntityProcessor do not need to call applyTransformer or know about rowIterator and getFromRowCache()
methods.
> # Change the meaning of EntityProcessor#destroy to be called on end of parent's row --
Right now init is called once per parent row but destroy actually means the end of import.
In fact, there is no correct way for an entity processor to do clean up right now. Most do
clean up when returning null (end of data) but with the introduction of $skipDoc, a transformer
can return $skipDoc and the entity processor will never get a chance to clean up for the current
init.
> # EntityProcessor will use the EventListener API to listen for import end. This should
be used by EntityProcessor to do a final cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message