lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Khludnev (Updated) (JIRA)" <>
Subject [jira] [Updated] (SOLR-3011) DIH MultiThreaded bug
Date Sun, 26 Feb 2012 16:00:49 GMT


Mikhail Khludnev updated SOLR-3011:

    Attachment: SOLR-3011.patch

Ok. I'm attaching refreshed path for core multithreading DIH issue: SOLR-3011.patch.


I added DocBuilder.destroy() to stop thread pool after all work is done. I'm bothered by testCase's
warns about "thread leaks" 

* EntityRunner give up create EntityProcessors and obtains it from constructor args
* proper destroying of EntityProcessors
* EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly by method
* EntityRunner.entityEnded was't thread-safe too. moved into ThreadedEntityProcessorWrapper
* object instantiating was drastically amended to be threadsafe 
** single EntityRunner per Entity
** single EntityProcessor per EntityRunner
** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor as delegate
** where N is number of threads specified at root entity (threads attr is prohibited for child
** ThreadedEntityProcessorWrapper are numbered by their positions in EntityRunner's tepw list
** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw with the same
number as its' own
* parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by plain Java
synchronous call (w/o thread pool)
protected transformRow() has been extracted from applyTransformer(). I need to reuse transformers
logic for the paged flow but applyTransformer() has side-effect on rowcache field. 
in addition to all refactorings above (instantiating and field move). it contains the core
idea of multithred cached entity processor:
* after tepw obtains access to thread-unaware delegate entityProcessor it need to pull whole
page - all children rows belong to the current parent roe, 
* whole page is transformed and put into tepw.rowcahce, where they will be pulled later by
the parent entity tepw
* important point is condition which enables the paged mode. I beleve any children entiry
should be processed in paged mode. see TEPW.nextRow() var retrieveWholePage 

I've got that this test doesn't cover cached entity processor (where="") and doesn't
cover N+1 usage ("... where y.xid=${}"). There were single child row per parent. I added
both usages with all threads attribute cases.  

h1. TBD
* I have some suspicions in Context.SCOPE_DOC. 

* even after this patch multithread DIH suffer from SOLR-2961, SOLR-2804. I need this patch
applied to unlock them. 
* it's almost impossible to apply on 3.5. Whole SOLR-2382 with fixes should be ported before.


> DIH MultiThreaded bug
> ---------------------
>                 Key: SOLR-3011
>                 URL:
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 3.5, 4.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>             Fix For: 4.0
>         Attachments: SOLR-3011.patch, SOLR-3011.patch
> current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947.
I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from
28th Dec. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message