lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3011) DIH MultiThreaded bug
Date Mon, 02 Apr 2012 13:11:24 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244160#comment-13244160
] 

James Dyer commented on SOLR-3011:
----------------------------------

If the changes in 3.6 break FileListEntityProcessor then we should try and fix it.  A failing
unit test would help a lot.  As a workaround, you should always be able to use the 3.5 jar
with 3.6.

4.0 is only going to support single-threaded DIH configurations.  I understand that some users
have gotten performance gains using "threads" and haven't had problems.  I suspect these were
mostly cases like yours where you're processing text documents and have a somewhat simple
configuration.  But looking at the code, I don't think we can guarantee DIH using the "threads"
parameter will never encounter a race condition, etc, and that some configurations (especially
using SQL, caching, etc) were not working at all (which SOLR-3011 at least mostly fixes).
 It was also getting hard to add new features because all bets were pretty much off as to
whether or not any changes would work with "threads".

Long term, I would like to see some type of multi-threading added back in.  But we do need
to refactor the code.  I am looking now in trying to consolidate some of the objects that
DIH passes around, reducing member visibility, making things immutable, etc.  Some of the
classes need to be made simpler (DocBuilder comes to mind).  Hopefully we can have a code
base that can be more easily made threadsafe in the future.
                
> DIH MultiThreaded bug
> ---------------------
>
>                 Key: SOLR-3011
>                 URL: https://issues.apache.org/jira/browse/SOLR-3011
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 3.5
>            Reporter: Mikhail Khludnev
>            Assignee: James Dyer
>            Priority: Minor
>             Fix For: 3.6
>
>         Attachments: SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch, SOLR-3011.patch,
SOLR-3011.patch, patch-3011-EntityProcessorBase-iterator.patch, patch-3011-EntityProcessorBase-iterator.patch
>
>
> current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947.
I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from
28th Dec. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message