lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maria Vazquez <maria.vazq...@dexone.com>
Subject Re: Multithreaded JdbcDataStore and CachedSqlEntityProcessor
Date Wed, 12 Oct 2011 16:55:36 GMT
I’ll try to test the patch this week.
Thanks!


On 10/10/11 2:01 AM, "Mikhail Khludnev" <mkhludnev@griddynamics.com> wrote:

> Hello,
> 
> Pls have a look to attached patch for
> http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_4_0/
> 
> It makes all tests green, but I have a several doubts in it.
> 
> Key points:
> 
> # EntityProcessorBase has context and  dataSourceRowCache fields which can't
> be shared among threads (per thread ThreadedEntityProcessorWrappers share
> single instance of EntityProcessor).
> * I removed context field, and provide it via Context.CURRENT threadlocal.
> * here is the really bad thing EntityProcessorBase.init() has a special case
> for "white box" tests, which bypass putting context into threadlocal. i.e.
> during full circuit run init() argument is ignored, but for tests it's pushing
> into threadlocal. I don't know how to improve it.  
> * I removed dataSourceRowCache and getFromRowCacheTransformed() by inlining
> it, because I see no purpose in it - it's iteration state. As a drawback
> consuming CachedSqlEntProc empties it, and after some time of thinking I've
> got that I'm wrong here. CachedSqlEntProc should be reused during debug
> sessions. I'm going to return dataSourceRowCache back, but put into
> threadlocal context too. Nevertheless, right now cacheWithWhereClause is used
> to distribute locks by PK.
> * synchronized CachedSqlEntityProcessor.init(), because it calls thread
> sensitive cacheInit()
> 
> # DocBuilder 
> * creating EntityProcessor in EntityRunner.<init> was wrong because it leads
> to creating CachedSqlEntProc per thread as Maria has spotted. I created per
> entity processors map in DocBuilder. It allows to share single
> CachedSqlEntProc among threads.
> * I implemented stacking Context which is passed through thread local.
> (push/pop&oldCtx in runAThread())
> * I was wonder that there is the second code path in additiond to
> EntityRunners: buildDocument() - stacking contexts was already done there, but
> I had to reorder init() and push rows;
> * and the third code path collectDelta() - here I had to add stacking context
> too. 
> 
> # plenty changes in tests, there two kinds of it
> * clearing context in threadlocal  before and after "white-box" tests. (in
> whiteboxing I mean unit tests that avoid "official" DocBuilder entry points)
> * and in TestCachedSqlEntityProcessor you can see that I actually broke
> Caching nature of this processor, required for debug most.
> 
> # Summary:
> * by applying this patch you'll have CachedSqlEntProc works in multiple
> threads, but for every debug request it will pull all data every time.
> * I'm going to fix this issue and provide non-invasive fix by putting
> dataSourceRowCache into a context.entitySession.
> 
> # Questions to DIH committers
> * isn't ContextImpl.putVal(String, Object, Map) wrong (it always put into
> entitySession)
> * is there any plan to refactor DIH into single codepath on EntityRunners (and
> remove separate flows for debug and delta);
> 
> --
> Mikhail
> 
> On Sat, Oct 8, 2011 at 1:47 AM, Mikhail Khludnev <mkhludnev@griddynamics.com>
> wrote:
>> Hello,
>> 
>> First of all I've got the same feeling for some time. 
>> I've checked TestThreaded
>> <http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_4_0/solr/contr
>> ib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestThreaded
>> .java>  and got that it doesn't cover CachedSqlEntityProcessor with fancy
>> where="xid=x.id <http://x.id> " feature.
>> Pls find the test for it attached:
>> dih-mt-cached-race-condition-TestThreaded.java.patch (as a patch for
>> TestThreaded from tags/lucene_solr_3_4_0) 
>> 
>> Most times testCachedMultiThread_FullImport() fails (but really rare it
>> passed) and testCachedSingleThread_FullImport() is always green. The
>> difference is one and two threads. 
>> Pls also find attached:
>> 
>> dih-mt-cached-race-condition-TestThreaded.log contains FINEST messages
>> including the root cause (my guess):
>> 2011/10/08 0:01:13
>> org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow
>> 詳細レベル (低): arow : null
>> 
>> (I'm sorry for hieroglyphs, I don't know where they comes from, it should be
>> just FINEST)
>> and corrupted documents, where some of child entity fields are missed.
>> 
>> dih-mt-cached-race-condition-TestThreaded.test.out - is the actual test
>> failure (I'm sorry but it was another run than in .log). The failed assert is
>> the searching root entities by the children terms.
>> 
>> I'm not sure but creating children entity runners in multiple threads looks
>> really suspicious for me, I hadn't have a time to look deeper into the code.
>> 
>> PS. I want to contribute two bleeding ideas (and implementations too), which
>> improves DIH much, but this race condition is a blocker for them.
>> 
>> Regards
>> 
>> On Mon, Oct 3, 2011 at 10:09 PM, Maria Vazquez <maria.vazquez@dexone.com>
>> wrote:
>>> When I'm debugging, if it is single threaded
>>> CachedSqlEntityProcessor.getAllNonCachedRows is called only once, all the
>>> rows cached and next time it requests a row it gets it from the cached data.
>>> In the logs I see the SQL call only once.
>>> 
>>> If I use multiple threads, it calls
>>> CachedSqlEntityProcessor.getAllNonCachedRows multiple times, so it creates a
>>> new cache per thread.
>>> 
>>> The cache should be shared among threads and be safe, right?
>>> 
>>> Thanks!
>>> Maria
>>> 
>>> 
>>> 
>>> 
>>> On 10/1/11 8:00 PM, "pulkitsinghal@gmail.com" <pulkitsinghal@gmail.com>
>>> wrote:
>>> 
>>>> > What part of the source code in debug mode behaved in a fashion such
as
>>>> to
>>>> > make it seem like it is not thread-safe?
>>>> >
>>>> > If it feels difficult to put into words then you can always make a small
>>>> 5 min
>>>> > screencast to demo the issue and talk about it. I do that for really
>>>> complex
>>>> > stuff with Jing by techsmith (free version).
>>>> >
>>>> > Or you can put up a small and simplified test case together to demo
the
>>>> issue
>>>> > and then paste the link for that hosted attachment :)
>>>> >
>>>> > Sent from my iPhone
>>>> >
>>>> > On Sep 30, 2011, at 1:28 PM, Maria Vazquez <maria.vazquez@dexone.com>
>>>> wrote:
>>>> >
>>>>> >> Hi,
>>>>> >> I¹m using threads with JdbcDataStore and CachedSqlEntityProcessor.
>>>>> >> I noticed that if I make it single threaded CachedSqlEntityProcessor
>>>>> behaves
>>>>> >> as expected (it only queries the db once and caches all the
rows). If I
>>>>> make
>>>>> >> it multi threaded it seems to make multiple db queries and when
I debug
>>>>> the
>>>>> >> source code it looks like it is not thread safe.
>>>>> >> Any ideas?
>>>>> >> Thanks,
>>>>> >> Maria
>>>>> >>
>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message