lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3779) LineEntityProcessor processes only one document
Date Fri, 31 Aug 2012 20:16:07 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446323#comment-13446323
] 

James Dyer commented on SOLR-3779:
----------------------------------

Ahmet, thanks for reporting this and providing a fix!  I'm pretty sure this was caused by
SOLR-2382, see item #6 in the description "change the semantics of entity.destroy()".  And
I do think your fix is correct:  just close the reader when it runs out of data so that the
next time around it will open a new reader on the next file in the list.  LEP is the only
EntityProcessor that depended on the old semantics of destroy().

The disturbing thing here is that TestLineEntityProcessor passes, so clearly it is not testing
the combination of FLEP/LEP correctly, even though the code comments indicate this was the
intention.  Likely we need to replace this test with something in the spirit of the test included
with SOLR-3307, or at least improve the mock-up LEP with something more realistic.  In any
case, we'll need a unit test that actually fails prior to your patch and then passes with
it applied...
                
> LineEntityProcessor processes only one document
> -----------------------------------------------
>
>                 Key: SOLR-3779
>                 URL: https://issues.apache.org/jira/browse/SOLR-3779
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0-BETA
>            Reporter: Ahmet Arslan
>            Assignee: James Dyer
>             Fix For: 4.0
>
>         Attachments: SOLR-3779.patch
>
>
> LineEntityProcessor processes only one document when combined with FileListEntityProcessor.
> {code:xml}
> <dataConfig>
> <dataSource type="FileDataSource" encoding="UTF-8" name="fds"/>
>     <document>
>        <entity name="f" processor="FileListEntityProcessor" fileName=".*txt" baseDir="/Volumes/data/Documents"
recursive="false" rootEntity="false" dataSource="null" transformer="TemplateTransformer" >
>              <entity onError="skip" name="jc"   processor="LineEntityProcessor" url="${f.fileAbsolutePath}"
dataSource="fds"  rootEntity="true" transformer="TemplateTransformer">
>           	  <field column="link" template="hello${f.fileAbsolutePath},${jc.rawLine}"
/>
>           	  <field column="rawLine" name="rawLine" />
>              </entity>          	  
>         </entity>
>     </document>
> </dataConfig>
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message