lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Van Den Berghe, Vincent" <Vincent.VanDenBer...@bvdinfo.com>
Subject RE: Bug in Lucene static initialization with multiple threads.
Date Tue, 21 Mar 2017 16:58:53 GMT
Hello Shad,

Regarding TestDuringAddIndexes, you found 99.9% of the problem. Allow me to contribute to
the remaining 0.1%.
There is indeed nothing wrong with the limited concurrencylevel task scheduler. The problem
is the ExecutionHelper<T>'s method of obtaining the results.
I see this:

            public bool MoveNext()
            {
                if (numTasks > 0)
                {
                    try
                    {
                        current = service.Take().Result;
                    }
#if !NETSTANDARD
                    catch (System.Threading.ThreadInterruptedException)
                    {
                        throw;
                    }
#endif
                    catch (Exception e)
                    {
                        // LUCENENET NOTE: We need to re-throw this as Exception to 
                        // ensure it is not caught in the wrong place
                        throw new Exception(e.ToString(), e);
                    }
                    finally
                    {
                        --numTasks;
                    }
                }

                return false;
            }

The call to service.Take() returns an awaitable task, which is nothing but a promise that
the result will be delivered at some future time.
Alas, the enumerator is strictly synchronous and expects the result to be available at each
iteration. 
Accessing result without waiting for the task to end (as is done now) will yield incorrect
result.
The shortest solution is to replace:

	current = service.Take().Result;

By:

	var awaitable = service.Take();
	awaitable.Wait();
	current = awaitable.Result;

The best solution would be to await al tasks in parallel (or at least as many as possible)
and return their value as soon as they become available. This is EXTREMELY hard to do, since
you will be essentially fighting a battle between your executor service and its execution
helper, so I leave that as an exercise for the reader.

No files to send this time, it's just these 3 lines.

Vincent

-----Original Message-----
From: Shad Storhaug [mailto:shad@shadstorhaug.com] 
Sent: Tuesday, March 21, 2017 3:42 PM
To: Van Den Berghe, Vincent <Vincent.VanDenBerghe@bvdinfo.com>
Cc: dev@lucenenet.apache.org
Subject: RE: Bug in Lucene static initialization with multiple threads.

Hi Vincent,

Sure, you can email the files to me directly.

For a quick start on Git/GitHub, there is a fairly short book called Pragmatic Version Control
Using Git (https://pragprog.com/book/tsgit/pragmatic-version-control-using-git) that gets
you up and running quickly. I think you might be attempting to push to the main repo - and
you won't have permission unless it is explicitly granted. What you need to do is to fork
the repo to your own GitHub account, then you can read/write it as much as necessary. Once
you get it to a point where you want to submit something, you can do a pull request (either
through GitHub or just manually email a request) and someone else can then review and merge
the changes.

Update

I found the source of the Lucene.Net.Tests.Index.TestIndexReaderWriter.TestDuringAddIndexes()
problem - it always occurs when you call an overload of the IndexSearcher constructor that
takes a TaskScheduler as a parameter and pass a non-null value. This is built into the test
framework (https://github.com/apache/lucenenet/blob/api-work/src/Lucene.Net.TestFramework/Util/LuceneTestCase.cs#L1778)
to happen rarely, which explains many of the random failures we are seeing. If you change
the random code to never use a TaskScheduler, the test will always pass, change it to always
use a TaskScheduler and it will always fail.

The implementation of TaskScheduler we are using for testing (https://github.com/apache/lucenenet/blob/api-work/src/Lucene.Net.Core/Support/LimitedConcurrencyLevelTaskScheduler.cs)
was copied directly from MSDN, so I doubt that is the issue. In fact, there is a good chance
that the issue is similar to the WeakIdentityMap issue in that there is an enumerator/call
to enumerator that is not thread-safe see (https://github.com/apache/lucenenet/blob/api-work/src/Lucene.Net.Core/Search/IndexSearcher.cs#L474-L500)
and (https://github.com/apache/lucenenet/blob/api-work/src/Lucene.Net.Core/Search/IndexSearcher.cs#L569-L590).

Anyway, I know of at least 3 tests that are failing as a result of this, so fixing it would
be a big prize.


Thanks,
Shad Storhaug (NightOwl888)

-----Original Message-----
From: Van Den Berghe, Vincent [mailto:Vincent.VanDenBerghe@bvdinfo.com] 
Sent: Tuesday, March 21, 2017 8:51 PM
To: Shad Storhaug
Cc: dev@lucenenet.apache.org
Subject: RE: Bug in Lucene static initialization with multiple threads.

Hello Shad,

I had a little time on my hands and looked into this WeakIdentityMap issue, more specifically
TestConcurrentHashMap which fails for me as well (in 100% of the cases). Maybe I have something
to contribute: I have 3 observations:

First, notice that in TestConcurrentHashMap , 8 threads are created and then all joined by
doing the following:

            finally
            {
                foreach (var w in workers)
                {
                    w.Join(1000L);
                }
            }

This gives the first thread 1 second to end, the second one at most 2 seconds (1 second +
whatever time the first thread needed to end) and so on.  Given the amount of work of each
test thread, this is far too little time even on a fast machine. It takes 13 seconds for all
threads to end here.
The corresponding java test has the following:

   while (!exec.awaitTermination(1000L, TimeUnit.MILLISECONDS));

... which in effect just loops until the execution of each thread is finished, in units of
1 second.
In TPL, threads would be tasks and we would just be able to call  Task.WaitAll. Since we're
dealing with "real" threads here,  I would suggest just call w.Join() and be done with it.
This would align the test with the java behavior.

Second, there are various weaknesses in the WeakIdentityMap:
1) the implementation of the Keys enumerator (IteratorAnonymousInnerClassHelper) relies on
the order of the elements in the keys collection (outerInstance.backingStore.Keys.ElementAt(position)).
This is bad for two reasons:
- it is extremely slow (there is no indexed access on outerInstance.backingStore.Keys in any
current implementation, so ElementAt needs to skip "position" elements to get at the correct
one)
- it relies on the assumption that removing (or adding) a key in a dictionary doesn't change
the previous relative key order, which is incorrect in any current .NET implementation I am
aware of (dictionaries are hash tables with collision resolution through chaining, and reusing
of slots through a free list: it's just asking for trouble).
It turns out that you can use the backing store enumerator to implement the keys enumerator
directly. The main loop simply becomes:

			public bool MoveNext()
			{
				while (enumerator.MoveNext())
				{
					next = enumerator.Current.Key.Target;
					if (next != null)
					{
						// unfold "null" special value:
						if (next == NULL)
							next = null;
						return true;
					}
				}
				return false;
			}

This works in the non-concurrent case (because we don't touch the collection while the enumerator
is running), and in the concurrent case as well (because the ConcurrentDictionary<K,V>
enumerator works by design and handles concurrent modifications without a problem).

2) calling Reap() create objects on the heap, even when there are no elements to be removed.
Sadly, not all of these allocation can be eliminated, but you can delay the creation of the
keysToRemove list until it's really needed:

			List<IdentityWeakReference> keysToRemove = null;
			foreach (IdentityWeakReference zombie in backingStore.Keys)
			{
				if (!zombie.IsAlive)
				{
					// create the list of keys to remove only if there are keys to remove.
					// this reduces heap pressure
					if (keysToRemove == null)
						keysToRemove = new List<IdentityWeakReference>();
					keysToRemove.Add(zombie);
				}
			}

			if (keysToRemove != null)
				foreach (var key in keysToRemove)
				{
					backingStore.Remove(key);
				}

Note that I don't iterate the Keys collection, but use the dictionary enumerator. Believe
it or not, but this is slightly more efficient for reasons I won't explain here since this
e-mail is already long enough.
It's sad but inevitable that a heap object is created for the dictionary enumerator, because
we call it through an interface (IDictionary<K,V>): it we had the actual object, no
enumerator object would be created on the heap.

3) Equality of weak identity references can be done using only one case (using "as" instead
of "is"), which is more efficient.


Third, the test itself uses enumerators in a nonstandard manner. The two witness cases are:

	IEnumerator<string> iter = map.Keys.GetEnumerator();
	Assert.IsTrue(iter.MoveNext());
	Assert.IsNull(iter.Current);
	Assert.IsFalse(iter.MoveNext());
	Assert.IsFalse(iter.MoveNext());

And

            for (IEnumerator<string> iter = map.Keys.GetEnumerator(); iter.MoveNext();)
            {
                //Assert.IsTrue(iter.hasNext()); // try again, should return same result!
                string k = iter.Current;
	...
            }

All the other instances are variants of these witnesses.
The correct way of using IEnumerator<T> is by calling IEnumerator<T>.Dispose()
after you're finished with the instance. Note that in Lucene itself, foreach() is used which
does it correctly (ByteBufferIndexInput.cs):

                    foreach (ByteBufferIndexInput clone in clones.Keys)
                    {
                        clone.UnsetBuffers();
                    }

All usages of enumerators in TestWeakIdentityMap.cs must be rewritten accordingly. For example:

	using (IEnumerator<string> iter = map.Keys.GetEnumerator())
	{
		Assert.IsTrue(iter.MoveNext());
		Assert.IsNull(iter.Current);
		Assert.IsFalse(iter.MoveNext());
		Assert.IsFalse(iter.MoveNext());
	}
And

            foreach (object k in map.Keys)
            {
	...
            }

In case you are wondering why this is so important: you cannot guarantee that future implementations
of an enumerator (especially one on a concurrent collection) doesn't have a cleanup to do
to get rid of various synchronization objects. Right now this isn't the case, but you never
know what the future will bring. And besides, nice guys Dispose() after their enumeration
<g>.

The test passes now. Every time.

I've made the changes to api-work in my local repository, but when I tried to "push" or "sync"
them, I get :

	Error encountered while pushing to the remote repository: Response status code does not indicate
success: 403 (Forbidden).

I know next to nothing about GitHub. Can I e-mail the changed files to someone?

Vincent

-----Original Message-----
From: Shad Storhaug [mailto:shad@shadstorhaug.com] 
Sent: Monday, March 20, 2017 9:19 PM
To: Van Den Berghe, Vincent <Vincent.VanDenBerghe@bvdinfo.com>
Cc: dev@lucenenet.apache.org
Subject: RE: Bug in Lucene static initialization with multiple threads.

Hi Vincent,

Thanks for reporting this. In fact, thank you for all of your assistance tracking down bugs.

This issue boils down to being a failed attempt to replace Lucene's WeakIdentityMap with a
new data structure called WeakDictionary. Since there are already tests to verify concurrency
on WeakIdentityMap and it is used in a couple of other places in Lucene, it would be far better
to get it working right than to try to fix this alternative version. I guess for the time
being your workaround should suffice (though, a fix rather than a hack would be preferred).

I have spent quite a bit of time on this, but the best I have been able to do is to get the
Lucene.Net.Tests.Util.TestWeakIdentityMap.TestConcurrentHashMap() test to pass about 50% of
the time (and I can't seem to even get it back into that state). 

Here are a couple of attempts I have made:

https://github.com/NightOwl888/lucenenet/commits/api-work-weak-identity-map-1 - using a port
of the original Java backing classes
https://github.com/NightOwl888/lucenenet/commits/api-work-weak-identity-map-2 - using the
.NET WeakIdentity class

And here is the original Java version: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/util/WeakIdentityMap.java


The complicated part is getting it to "reap" the elements in a thread-safe way so the counts
are right on several concurrent enumerators. Any assistance you could provide to make WeakIdentityMap
thread-safe would be much appreciated. Do note that the lead branch is now at https://github.com/apache/lucenenet/tree/api-work,
so please do any work from that branch.

Also note there are also currently a few other concurrency tests that are failing:

Lucene.Net.Tests.Index.TestIndexReaderWriter.TestDuringAddIndexes()
Lucene.Net.Tests.Search.TestControlledRealTimeReopenThread.TestControlledRealTimeReopenThread_Mem()
Lucene.Net.Tests.Search.TestControlledRealTimeReopenThread.TestCRTReopen()

I am sure that getting to the bottom of these issues will probably fix most of the issues
you are seeing. If you have any spare time, your help would be appreciated on these as well.

Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: Van Den Berghe, Vincent [mailto:Vincent.VanDenBerghe@bvdinfo.com] 
Sent: Tuesday, February 7, 2017 6:00 PM
To: dev@lucenenet.apache.org
Subject: Bug in Lucene static initialization with multiple threads.

Hello,

Every once in a while, I get an error when using Lucene in a multithreaded scenario (meaning:
using a single IndexWriter in multiple threads, or using a distinct IndexWriter in each thread:
it doesn't matter).
The exception chain thrown is:

Unhandled Exception: System.ArgumentException: Could not instantiate implementing class for
Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute
---> System.ArgumentException: Could not find implementing class for ICharTermAttribute
--->System.InvalidOperationException: Collection was modified; enumeration operation  may
not execute.

I could not understand what was going on, especially because it only occurred "sometimes".
It took me a while to figure out, but I think it's a bug.

Here's the stack trace of the exception when it occurs:

                [External Code]
>             Lucene.Net.dll!Lucene.Net.Support.HashMap<Lucene.Net.Support.WeakDictionary<System.Type,
System.WeakReference>.WeakKey<System.Type>, System.WeakReference>.GetEnumerator()
Line 229           C#
               [External Code]
               Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean()
Line 59           C#
               Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.CleanIfNeeded()
Line 71         C#
               Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Add(System.Type
key, System.WeakReference value) Line 134           C#
                Lucene.Net.dll!Lucene.Net.Util.AttributeSource.AttributeFactory.DefaultAttributeFactory.GetClassForInterface<Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute>()
Line 90  C#
                Lucene.Net.dll!Lucene.Net.Util.AttributeSource.AttributeFactory.DefaultAttributeFactory.CreateAttributeInstance<Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute>()
Line 70  C#
                Lucene.Net.dll!Lucene.Net.Util.AttributeSource.AddAttribute<Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute>()
Line 350                C#
               Lucene.Net.dll!Lucene.Net.Documents.Field.StringTokenStream.InitializeInstanceFields()
Line 658         C#
               Lucene.Net.dll!Lucene.Net.Documents.Field.StringTokenStream.StringTokenStream()
Line 676                C#
               Lucene.Net.dll!Lucene.Net.Documents.Field.GetTokenStream(Lucene.Net.Analysis.Analyzer
analyzer) Line 629         C#
               Lucene.Net.dll!Lucene.Net.Index.DocInverterPerField.ProcessFields(Lucene.Net.Index.IndexableField[]
fields, int count) Line 105              C#
                Lucene.Net.dll!Lucene.Net.Index.DocFieldProcessor.ProcessDocument(Lucene.Net.Index.FieldInfos.Builder
fieldInfos) Line 279          C#
                Lucene.Net.dll!Lucene.Net.Index.DocumentsWriterPerThread.UpdateDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
doc, Lucene.Net.Analysis.Analyzer analyzer, Lucene.Net.Index.Term delTerm) Line 287      
         C#
                Lucene.Net.dll!Lucene.Net.Index.DocumentsWriter.UpdateDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
doc, Lucene.Net.Analysis.Analyzer analyzer, Lucene.Net.Index.Term delTerm) Line 574      
         C#
               Lucene.Net.dll!Lucene.Net.Index.IndexWriter.UpdateDocument(Lucene.Net.Index.Term
term, System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField> doc, Lucene.Net.Analysis.Analyzer
analyzer) Line 1830          C#
                Lucene.Net.dll!Lucene.Net.Index.IndexWriter.AddDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
doc, Lucene.Net.Analysis.Analyzer analyzer) Line 1455   C#
                Lucene.Net.dll!Lucene.Net.Index.IndexWriter.AddDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
doc) Line 1436   C#

... and to wit, here are the threads just rushing in to do the same:

Not Flagged                        35428    17           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        35444    11           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        44124    12           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged        >             44140    13           Worker Thread <No Name>  
             Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        47700    14           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        28168    15           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        30988    16           Worker Thread <No Name>   
            Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal
Not Flagged                        21828    6              Worker Thread <No Name> 
              Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, System.WeakReference>.Clean
               Normal

The reason why it only reproduces "sometimes" is because of this little nugget of code:

        private void CleanIfNeeded()
        {
            int currentColCount = GC.CollectionCount(0);
            if (currentColCount > _gcCollections)
            {
                Clean();
                _gcCollections = currentColCount;
            }
        }

If one thread does a Clean() operation in the middle of another Clean() operation on the same
collection that replaces the object being enumerated on, you get the exception. Always.
To avoid the intermittence, create a bunch of threads like this and eliminate the test "if
(currentColCount > _gcCollections)" so that the Clean() code is always executed. You'll
get the exception every time.

I will not post the correction, but there's a simple workaround: just make sure the static
initializers are performed in a single thread.
I.e. before creating your threads, do something like this:

new global::Lucene.Net.Documents.TextField("dummy", "dummyvalue", global::Lucene.Net.Documents.Field.Store.NO).GetTokenStream(new
(some Analyzer object));

Replace "some Analyzer object" with an instance of an Analyzer object, it doesn't matter which
one. It's meaningless, but it has the side effect of initializing the static fields without
problems.


Vincent




Mime
View raw message