lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Khludnev (Updated) (JIRA)" <>
Subject [jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor
Date Wed, 28 Dec 2011 22:09:31 GMT


Mikhail Khludnev updated SOLR-2947:

    Attachment: SOLR-2947.patch

Ok. here is the patch, which fixes issue with destroy() and problem with multiple threads
and CachedSqlEntityProcessor.


* removed SCOPE_DOC constant. I can't find any usages. Old impl isn't thread safe. We can
implement it thread safe if you want. Let me know if it's necessary.
* Pay attention that ContextImpl.putVal() *ignores the scope provided*. It should be tracked
separately let me know if you like me to raise it. 

I added DocBuilder.destroy() to stop thread pool after all work is done. I'm bothered by testCase's
warns about "thread leaks"

it just introduces a getter. But I generated diff against uncommitted SOLR-2961, so line numbers
can be wrong, let me know I re-diff it. 

* EntityRunner stops create EntityProcessors and obtains it from constructor args
* proper destroying EntityProcessors
* EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly by method
* EntityRunner.entityEnded was't thread-safe too. moved into ThreadedEntityProcessorWrapper
* object instantiating was drastically amended to be threadsafe 
** single EntityRunner per Entity
** single EntityProcessor per EntityRunner
** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor as delegate
** where N is number of threads specified at root entity (threads attr is prohibited for child
** ThreadedEntityProcessorWrapper are numbered by their positions in EntityRunner's tepw list
** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw with the same
number as its' own
* parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by plain Java
synchronous call (w/o thread pool), 
isPaged() property has been introduced
protected transformRow() has been extracted from applyTransformer(). I need to reuse transformers
logic for the paged flow but applyTransformer() has side-effect on rowcache field. 
in addition to all refactorings above (instantiating and field move). it contains the core
idea of multithred cached entity processor:
* after tepw obtains access to thread-unaware delegate entityProcessor it need to pull whole
page - all children records belong to the current parent, 
* whole page is transformed and put into tepw.rowcahce, where they will be pulled later by
the parent entity tepw

added full space test for CachedSqlEP for no, 1, 2, 10 (keep in mind 1 thread don't equal
to no-threads) 
add double destroy() check EntityProcessors

specifies 10 threads and add double destroy() EntityProcessors


> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>                 Key: SOLR-2947
>                 URL:
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch,
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable
DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality:
parent-child join.
> the reason is [line 473 entityProcessor.destroy();|]
breaks children entityProcessor.
> see attachement comments for more details. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message