lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SOLR-2186) DataImportHandler multi-threaded option throws exception
Date Sun, 24 Oct 2010 00:49:22 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923765#action_12923765
] 

Lance Norskog edited comment on SOLR-2186 at 10/23/10 8:47 PM:
---------------------------------------------------------------

This is the dataConfig.xml. It is very simple: it walks a directory and indexes every PDF
file it finds.
If you change threads='4' to threads='1', it will still fail. If you remove the threads directive,
it runs.

{noformat}
<dataConfig>
   <dataSource type="BinFileDataSource"/>
   <document>
     <entity name="jc" dataSource="null"
             pk="id"
             processor="FileListEntityProcessor"
             fileName="^.*\.pdf$" recursive="false"
             baseDir="/lucid/private_pdfs/10.pdfs"
             transformer="TemplateTransformer"
             threads='4'
             >

        <field column="id" template="${jc.fileAbsolutePath}"/>

        <entity name="tika-test" processor="TikaEntityProcessor"
                url="${jc.fileAbsolutePath}"
                parser="org.apache.tika.parser.pdf.PDFParser"
                onError="skip"
                >
                <field column="Author" name="author" meta="true"/>
                <field column="title" name="title" meta="true"/>
                <field column="text" name="text"/>
        </entity>
      </entity>
    </document>
</dataConfig>
{noformat}

      was (Author: lancenorskog):
    This is the dataConfig.xml. It is very simple: it walks a directory and indexes every
PDF file it finds.
If you change threads='4' to threads='1', it will still fail. If you remove the threads directive,
it runs.

<dataConfig>
   <dataSource type="BinFileDataSource"/>
   <document>
     <entity name="jc" dataSource="null"
             pk="id"
             processor="FileListEntityProcessor"
             fileName="^.*\.pdf$" recursive="false"
             baseDir="/lucid/private_pdfs/10.pdfs"
             transformer="TemplateTransformer"
             threads='4'
             >

        <field column="id" template="${jc.fileAbsolutePath}"/>

        <entity name="tika-test" processor="TikaEntityProcessor"
                url="${jc.fileAbsolutePath}"
                parser="org.apache.tika.parser.pdf.PDFParser"
                onError="skip"
                >
                <field column="Author" name="author" meta="true"/>
                <field column="title" name="title" meta="true"/>
                <field column="text" name="text"/>
        </entity>
      </entity>
    </document>
</dataConfig>

  
> DataImportHandler multi-threaded option throws exception
> --------------------------------------------------------
>
>                 Key: SOLR-2186
>                 URL: https://issues.apache.org/jira/browse/SOLR-2186
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>            Reporter: Lance Norskog
>
> The multi-threaded option for the DataImportHandler throws an exception and the entire
operation fails. This is true even if only 1 thread is configured via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message