lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Khludnev (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (SOLR-3011) DIH MultiThreaded bug
Date Tue, 13 Mar 2012 20:04:40 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228649#comment-13228649
] 

Mikhail Khludnev edited comment on SOLR-3011 at 3/13/12 8:03 PM:
-----------------------------------------------------------------

James,

bq. So it seems that for this to work, not only does the core (DocBuilder etc) need to be
thread-safe, but every component in a given DIH configuration needs to be also.

For me it's doubtful statement. I believe that it's possible to have bunch of threadUnsafe
classes synchronized by some smart orchestrator. 

bq. There also is quite a bit of code duplication in DocBuilder and classes

Yep. Agree, ThrdEPWrapper is a FullImport only DocBuilder code dupe.

bq. Mikhail, you've just noticed that MockDataSource was not designed to test a multi-threaded
scenario in a valid fashion.

not really, they just an odd mocks. With real DS every time you get a full resulset from the
beginning, but after you reach eof in MockDS's resultset, re-querying gets you the same eof.

bq. Take a look at TestDocBuilderThreaded.

I've never seen it actually.

bq. 1. Keep 3.x as-is, and make any quick fixes to threads for common use-cases there, as
possible.

No any quick fixes for any "common" use-cases is possible. I'm sure.

bq. 2. In 4.0 (or a separate branch), remove threading from DIH.

I suggest an opposite way:
* be honest with users and remove "threads" from 3.6. Zero impact here. Nobody use it. It
just doesn't work.
* as well I already spend enormous efforts for fixing in it 4.0. I hope I will complete the
fix anyway. (it will live at github at least). Btw, the reason why I fix 4.0 is SOLR-2382.
Actually I wait sometime before it was commited. 

bq. 4. Make DocBuilder, etc threadsafe. 5. Create a marker interface or annotation

I don't see how it's possible and be really helpful.

bq.  The SOLR-3011 patches work on 4.x .. But I can probably help with porting (some of?)
this patch back to 3.x.

Petr found a case where the patch doesn't work. After (if) I've done it, all commits around
SOLR-2382 can be cherrypicked to 3.x. Porting fix w/o DIHCacheSupport will take more time.

In parallel with my proposals above, I think we really need to start a design of new Ultimate
DIH. I propose
# to pick up usecases (you are experienced in extreme caching, I did a throughput maximization
via async producer-consumer, Peter will give us his cases, etc)
# sketch a design in plant uml, check that it's bulletproof 
# cut it onto pieces, scrum by crowd

Btw, isn't there something like DIH, maybe we can just reuse some other OSS tool, or library
instead of write it ourselves. Some time ago I've heard about something like Kettle. Don't
really know what it is. 


 
                
      was (Author: mkhludnev):
    James,

bq. So it seems that for this to work, not only does the core (DocBuilder etc) need to be
thread-safe, but every component in a given DIH configuration needs to be also.

For me it's doubtful statement. I believe it's possible to have bunch of threadUnsafe classes
synchronized by some smart orchestrator. 

bq. There also is quite a bit of code duplication in DocBuilder and classes

Yep. Agree, ThrdEPWrapper is a FullImport only DocBuilder code dupe.

bq. Mikhail, you've just noticed that MockDataSource was not designed to test a multi-threaded
scenario in a valid fashion.

not really, they just an odd mocks. With real DS every time you get a full resulset from the
beginning, but after you reach eof in MockDS's resultset, re-querying gets you the same eof.

bq. Take a look at TestDocBuilderThreaded.

I've never seen it actually.

bq. 1. Keep 3.x as-is, and make any quick fixes to threads for common use-cases there, as
possible.

No any quick fixes for any "common" use-cases is possible. I'm sure.

bq. 2. In 4.0 (or a separate branch), remove threading from DIH.

I suggest an opposite way:
* be honest with users and remove "threads" from 3.6. Zero impact here. Nobody use it. It
just doesn't work.
* as well I already spend enormous efforts for fixing in it 4.0. I hope I will complete the
fix anyway. (it will live at github at least). Btw, the reason why I fix 4.0 is SOLR-2382.
Actually I wait sometime before it was completed. 

bq. 4. Make DocBuilder, etc threadsafe. 5. Create a marker interface or annotation

I don't see how it's possible and be really helpful.

bq.  The SOLR-3011 patches work on 4.x .. But I can probably help with porting (some of?)
this patch back to 3.x.

Petr found a case where the patch doesn't work. After (if) I done it, all commits around SOLR-2382
can be cherrypicked to 3.x. Porting fix w/o DIHCacheSupport will take more time.

In  to my opposite proposals above, I think we really need to start a design of new Ultimate
DIH. I propose
# to pick up usecases (you are experienced in extreme caching, I did a throughput maximization
via async producer-consumer, Peter will give us his cases, etc)
# sketch a design in plant uml, check that it's bullet proof 
# cut in 


 
                  
> DIH MultiThreaded bug
> ---------------------
>
>                 Key: SOLR-3011
>                 URL: https://issues.apache.org/jira/browse/SOLR-3011
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 3.5, 4.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3011.patch, SOLR-3011.patch, patch-3011-EntityProcessorBase-iterator.patch,
patch-3011-EntityProcessorBase-iterator.patch
>
>
> current DIH design is not thread safe. see last comments at SOLR-2382 and SOLR-2947.
I'm going to provide the patch makes DIH core threadsafe. Mostly it's a SOLR-2947 patch from
28th Dec. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message