lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SOLR-2186) DataImportHandler multi-threaded option throws exception
Date Tue, 26 Oct 2010 05:21:21 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924838#action_12924838
] 

Lance Norskog edited comment on SOLR-2186 at 10/26/10 1:20 AM:
---------------------------------------------------------------

This patch file fixes up the DataImportHandler so that the TikaEntityProcessor works under
threads.

The technique is to pass in a resolver when creating a ThreadedContext (wrapper). This allows
TikaEP.firstInit() to work. However, TikaEP.nextRow is called with a context without a functioning
resolver, so: TikeEP caches the resolver given in firstInit() and uses it during nextRow()
instead of using the one it should use. Even so, the parsed text is spewed to the logger in
addition to being indexed.

This is not intended as fix patch;  it merely demonstrates the problem.

The patch is made with 'git diff' and I still haven't mastered it; some 'patch' programs may
not like it.





      was (Author: lancenorskog):
    This patch file fixes up the DataImportHandler so that the TikaEntityProcessor works under
threads.

The technique is to pass in a resolver when creating a ThreadedContext (wrapper). This allows
TikaEP.firstInit() to work. However, TikaEP.nextRow is called with a context without a functioning
resolver, so: TikeEP caches the resolver given in firstInit() and uses it during nextRow()
instead of using the one it should use.

This is not intended as fix patch;  it merely demonstrates the problem.

The patch is made with 'git diff' and I still haven't mastered it; some 'patch' programs may
not like it.




  
> DataImportHandler multi-threaded option throws exception
> --------------------------------------------------------
>
>                 Key: SOLR-2186
>                 URL: https://issues.apache.org/jira/browse/SOLR-2186
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>            Reporter: Lance Norskog
>         Attachments: TikaResolver.patch
>
>
> The multi-threaded option for the DataImportHandler throws an exception and the entire
operation fails. This is true even if only 1 thread is configured via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message