lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Khludnev (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-3011) DIH MultiThreaded bug
Date Sun, 26 Feb 2012 16:00:49 GMT

     [ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Khludnev updated SOLR-3011:
-----------------------------------

    Attachment: SOLR-3011.patch

Ok. I'm attaching refreshed path for core multithreading DIH issue: SOLR-3011.patch.

h3.Code

h4.DataImporter.java 

I added DocBuilder.destroy() to stop thread pool after all work is done. I'm bothered by testCase's
warns about "thread leaks"

h4.DocBuilder.java 

* EntityRunner give up create EntityProcessors and obtains it from constructor args
* proper destroying of EntityProcessors
* EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly by method
arguments
* EntityRunner.entityEnded was't thread-safe too. moved into ThreadedEntityProcessorWrapper
* object instantiating was drastically amended to be threadsafe 
** single EntityRunner per Entity
** single EntityProcessor per EntityRunner
** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor as delegate
** where N is number of threads specified at root entity (threads attr is prohibited for child
entities)
** ThreadedEntityProcessorWrapper are numbered by their positions in EntityRunner's tepw list
** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw with the same
number as its' own
* parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by plain Java
synchronous call (w/o thread pool)

h4.EntityProcessorWrapper.java
protected transformRow() has been extracted from applyTransformer(). I need to reuse transformers
logic for the paged flow but applyTransformer() has side-effect on rowcache field.

h4.ThreadedEntityProcessorWrapper.java 
in addition to all refactorings above (instantiating and field move). it contains the core
idea of multithred cached entity processor:
* after tepw obtains access to thread-unaware delegate entityProcessor it need to pull whole
page - all children rows belong to the current parent roe, 
* whole page is transformed and put into tepw.rowcahce, where they will be pulled later by
the parent entity tepw
* important point is condition which enables the paged mode. I beleve any children entiry
should be processed in paged mode. see TEPW.nextRow() var retrieveWholePage 

h3.Tests

h4.TestThreaded.java 
I've got that this test doesn't cover cached entity processor (where="xid=x.id") and doesn't
cover N+1 usage ("... where y.xid=${x.id}"). There were single child row per parent. I added
both usages with all threads attribute cases.  

h1. TBD
* I have some suspicions in Context.SCOPE_DOC. 

* even after this patch multithread DIH suffer from SOLR-2961, SOLR-2804. I need this patch
applied to unlock them. 
* it's almost impossible to apply on 3.5. Whole SOLR-2382 with fixes should be ported before.

Thanks

                
> DIH MultiThreaded bug
> ---------------------
>
>                 Key: SOLR-3011
>                 URL: https://issues.apache.org/jira/browse/SOLR-3011
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 3.5, 4.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3011.patch, SOLR-3011.patch
>
>
> current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947.
I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from
28th Dec. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message