lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shad Storhaug <s...@shadstorhaug.com>
Subject RE: Bug in Lucene static initialization with multiple threads.
Date Mon, 20 Mar 2017 20:19:11 GMT
Hi Vincent,

Thanks for reporting this. In fact, thank you for all of your assistance tracking down bugs.

This issue boils down to being a failed attempt to replace Lucene's WeakIdentityMap with a
new data structure called WeakDictionary. Since there are already tests to verify concurrency
on WeakIdentityMap and it is used in a couple of other places in Lucene, it would be far better
to get it working right than to try to fix this alternative version. I guess for the time
being your workaround should suffice (though, a fix rather than a hack would be preferred).

I have spent quite a bit of time on this, but the best I have been able to do is to get the
Lucene.Net.Tests.Util.TestWeakIdentityMap.TestConcurrentHashMap() test to pass about 50% of
the time (and I can't seem to even get it back into that state). 

Here are a couple of attempts I have made:

https://github.com/NightOwl888/lucenenet/commits/api-work-weak-identity-map-1 - using a port
of the original Java backing classes
https://github.com/NightOwl888/lucenenet/commits/api-work-weak-identity-map-2 - using the
.NET WeakIdentity class

And here is the original Java version: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/util/WeakIdentityMap.java


The complicated part is getting it to "reap" the elements in a thread-safe way so the counts
are right on several concurrent enumerators. Any assistance you could provide to make WeakIdentityMap
thread-safe would be much appreciated. Do note that the lead branch is now at https://github.com/apache/lucenenet/tree/api-work,
so please do any work from that branch.

Also note there are also currently a few other concurrency tests that are failing:

Lucene.Net.Tests.Index.TestIndexReaderWriter.TestDuringAddIndexes()
Lucene.Net.Tests.Search.TestControlledRealTimeReopenThread.TestControlledRealTimeReopenThread_Mem()
Lucene.Net.Tests.Search.TestControlledRealTimeReopenThread.TestCRTReopen()

I am sure that getting to the bottom of these issues will probably fix most of the issues
you are seeing. If you have any spare time, your help would be appreciated on these as well.

Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: Van Den Berghe, Vincent [mailto:Vincent.VanDenBerghe@bvdinfo.com] 
Sent: Tuesday, February 7, 2017 6:00 PM
To: dev@lucenenet.apache.org
Subject: Bug in Lucene static initialization with multiple threads.

Hello,

Every once in a while, I get an error when using Lucene in a multithreaded scenario (meaning:
using a single IndexWriter in multiple threads, or using a distinct IndexWriter in each thread:
it doesn't matter).
The exception chain thrown is:

Unhandled Exception: System.ArgumentException: Could not instantiate implementing class for
Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute
---> System.ArgumentException: Could not find implementing class for ICharTermAttribute
--->System.InvalidOperationException: Collection was modified; enumeration operation  may
not execute.

I could not understand what was going on, especially because it only occurred "sometimes".
It took me a while to figure out, but I think it's a bug.

Here's the stack trace of the exception when it occurs:

                [External Code]
>             Lucene.Net.dll!Lucene.Net.Support.HashMap<Lucene.Net.Support.WeakDictionary<System.Type,
System.WeakReference>.WeakKey<System.Type>, System.WeakReference>.GetEnumerator()
Line 229           C#
               [External Code]
               Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean()
Line 59           C#
               Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.CleanIfNeeded()
Line 71         C#
               Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Add(System.Type
key, System.WeakReference value) Line 134           C#
                Lucene.Net.dll!Lucene.Net.Util.AttributeSource.AttributeFactory.DefaultAttributeFactory.GetClassForInterface<Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute>()
Line 90  C#
                Lucene.Net.dll!Lucene.Net.Util.AttributeSource.AttributeFactory.DefaultAttributeFactory.CreateAttributeInstance<Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute>()
Line 70  C#
                Lucene.Net.dll!Lucene.Net.Util.AttributeSource.AddAttribute<Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute>()
Line 350                C#
               Lucene.Net.dll!Lucene.Net.Documents.Field.StringTokenStream.InitializeInstanceFields()
Line 658         C#
               Lucene.Net.dll!Lucene.Net.Documents.Field.StringTokenStream.StringTokenStream()
Line 676                C#
               Lucene.Net.dll!Lucene.Net.Documents.Field.GetTokenStream(Lucene.Net.Analysis.Analyzer
analyzer) Line 629         C#
               Lucene.Net.dll!Lucene.Net.Index.DocInverterPerField.ProcessFields(Lucene.Net.Index.IndexableField[]
fields, int count) Line 105              C#
                Lucene.Net.dll!Lucene.Net.Index.DocFieldProcessor.ProcessDocument(Lucene.Net.Index.FieldInfos.Builder
fieldInfos) Line 279          C#
                Lucene.Net.dll!Lucene.Net.Index.DocumentsWriterPerThread.UpdateDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
doc, Lucene.Net.Analysis.Analyzer analyzer, Lucene.Net.Index.Term delTerm) Line 287      
         C#
                Lucene.Net.dll!Lucene.Net.Index.DocumentsWriter.UpdateDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
doc, Lucene.Net.Analysis.Analyzer analyzer, Lucene.Net.Index.Term delTerm) Line 574      
         C#
               Lucene.Net.dll!Lucene.Net.Index.IndexWriter.UpdateDocument(Lucene.Net.Index.Term
term, System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField> doc, Lucene.Net.Analysis.Analyzer
analyzer) Line 1830          C#
                Lucene.Net.dll!Lucene.Net.Index.IndexWriter.AddDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
doc, Lucene.Net.Analysis.Analyzer analyzer) Line 1455   C#
                Lucene.Net.dll!Lucene.Net.Index.IndexWriter.AddDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
doc) Line 1436   C#

... and to wit, here are the threads just rushing in to do the same:

Not Flagged                        35428    17           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        35444    11           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        44124    12           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged        >             44140    13           Worker Thread <No Name>  
             Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        47700    14           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        28168    15           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        30988    16           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        21828    6              Worker Thread <No Name> 
              Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal

The reason why it only reproduces "sometimes" is because of this little nugget of code:

        private void CleanIfNeeded()
        {
            int currentColCount = GC.CollectionCount(0);
            if (currentColCount > _gcCollections)
            {
                Clean();
                _gcCollections = currentColCount;
            }
        }

If one thread does a Clean() operation in the middle of another Clean() operation on the same
collection that replaces the object being enumerated on, you get the exception. Always.
To avoid the intermittence, create a bunch of threads like this and eliminate the test "if
(currentColCount > _gcCollections)" so that the Clean() code is always executed. You'll
get the exception every time.

I will not post the correction, but there's a simple workaround: just make sure the static
initializers are performed in a single thread.
I.e. before creating your threads, do something like this:

new global::Lucene.Net.Documents.TextField("dummy", "dummyvalue", global::Lucene.Net.Documents.Field.Store.NO).GetTokenStream(new
(some Analyzer object));

Replace "some Analyzer object" with an instance of an Analyzer object, it doesn't matter which
one. It's meaningless, but it has the side effect of initializing the static fields without
problems.


Vincent





Mime
View raw message