lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Davidson <descenter...@hotmail.com>
Subject Re: IndexReader disposal causing unexpected 'AlreadyClosedException'
Date Mon, 24 Mar 2014 18:56:30 GMT
Hi all,

I don't recall finding anything in the source to indicate that absence
of a Dispose call would leak anything important, but being a bit
paranoid about that stuff I made sure it got called correctly anyway in
my own Lucene.NET wrapper.

I implemented a shim which returned a disposable 'session' object
wrapping a Reader and Searcher, which would IncRef at construction and
DecRef at disposal. The factory would Dispose the reader and searcher
before replacing them. Each new shim object would atomically take
references to them. There was something about lazily creating a Searcher
too, since the refresh rate was such that not every snapshot would get
searched against.

This was mainly for the purposes of instrumentation, because we had
issues with suspected resource leaks somewhere and needed to guarantee
proper cleanup of absolutely everything. As I recall, Dispose pretty
much just calls DecRef and lets the reference-counting clean stuff up.
So as long as construction is matched with Dispose and every IncRef is
matched with a DecRef, everything should get 'disposed' at the right
time.

Alex

On Mon, 2014-03-24 at 19:37 +0100, Simon Svensson wrote:
> Hi,
> 
> I wrote that answer in 2011 and have continued building code on the 
> assumption that the normally occurring garbage collection solves 
> everything, so far without any issues. The usual logic goes...
> 
>      public void Update(Entity x) {
>          // writer.AddDocument, writer.Commit, ...
>          _reader = _reader.Reopen();
>          _searcher = null;
>      }
> 
>      public IndexSearcher GetSearcher() {
>          var searcher = _searcher;
>          if (searcher == null) {
>              searcher = new IndexSearcher(_reader);
>              _searcher = searcher;
>          }
> 
>          return searcher;
>      }
> 
> While there are some threading issues with this code, I've never had any 
> memory issues with this implementation. Old reader instances (from 
> Reopen) are thrown away once any active searchers are done, same thing 
> for old searcher instances. Duplicate searcher instances (threading 
> issue) are also garbage collected.
> 
> // Simon
> 
> On 24/03/14 19:07, Paul Irwin wrote:
> > Yes, relying on GC is Simon's answer in that SO question and has been my
> > approach as well and it hasn't caused any problems. There isn't any complex
> > or unmanaged state in the
> > IndexSearcher/IndexReader/DirectoryReader/SegmentReader, so GC cleans it up
> > nicely. I would be interested to know if anyone has found an issue with
> > this approach, though.
> >
> >
> > On Mon, Mar 24, 2014 at 1:37 PM, Allan, Brad (Bracknell) <
> > Brad.Allan@fiserv.com> wrote:
> >
> >> Thanks Paul,
> >> Great answer!
> >>
> >> One further query around your suggestion, in the following scenario:
> >> - Thread 1 has a searcher from the manager.
> >> - Some index related changes happen (i.e. reopen will be necessary)
> >> - Thread 2 asks for a searcher - cool, we return a 'new' IndexSearcher
> >> - Thread 1 is done with searching.......nothing would seem to Dispose the
> >> 'old' IndexReader?  So do I just trust .NET GC to clean it up because my
> >> application now has zero refs?
> >>
> >> -----Original Message-----
> >> From: Paul Irwin [mailto:pirwin@feature23.com]
> >> Sent: 24 March 2014 16:43
> >> To: user@lucenenet.apache.org
> >> Subject: Re: IndexReader disposal causing unexpected
> >> 'AlreadyClosedException'
> >>
> >> Brad,
> >>
> >> Yes the technique is similar however you are calling Dispose on an object
> >> that might have references to it elsewhere, hence your exception. So best
> >> practice, unless you have to implement some sort of global disposal
> >> strategy after testing shows that GC isn't effective enough, is to let GC
> >> take care of it. See here: http://stackoverflow.com/a/5491000/44636
> >>
> >> Also static variables would be preferred to storing in a cache as caches
> >> are typically LRU or memory-pressure managed, and may end up being
> >> serialized out of process as your architecture grows. Keeping it in a
> >> static variable makes it clear that you don't want it cleaned up under
> >> memory pressure and that it can't be serialized for out-of-process use.
> >> Maybe try something like this, if you have to constantly make sure it's up
> >> to date. Note the use of IsCurrent() to check to see if you need to
> >> Reopen() it, and that if you're mainly dealing with IndexSearcher and not
> >> IndexReader directly, try to stick to using IndexSearcher and keep the
> >> IndexReader hidden. The IndexSearcher will have a reference to the reader,
> >> and that should be singleton as well for performance reasons.
> >>
> >> public static class IndexSearcherManager
> >>      {
> >>          private static readonly Directory _dir;
> >>          private static IndexReader _reader;
> >>          private static IndexSearcher _searcher;
> >>          private static readonly object _lock = new object();
> >>
> >>          static IndexSearcherManager()
> >>          {
> >>              _dir = new RAMDirectory(); // replace with your dir
> >>              _reader = IndexReader.Open(_dir, true);
> >>              _searcher = new IndexSearcher(_reader);
> >>          }
> >>
> >>          public static Directory Directory
> >>          {
> >>              get { return _dir; }
> >>          }
> >>
> >>          public static IndexReader IndexReader
> >>          {
> >>              get
> >>              {
> >>                  ReopenIfNecessary();
> >>                  return _reader;
> >>              }
> >>          }
> >>
> >>          public static IndexSearcher IndexSearcher
> >>          {
> >>              get
> >>              {
> >>                  ReopenIfNecessary();
> >>                  return _searcher;
> >>              }
> >>          }
> >>
> >>          private static void ReopenIfNecessary()
> >>          {
> >>              if (_reader.IsCurrent())
> >>                  return;
> >>
> >>              lock (_lock)
> >>              {
> >>                  if (_reader.IsCurrent())
> >>                      return;
> >>
> >>                  _reader = _reader.Reopen();
> >>                  _searcher = new IndexSearcher(_reader);
> >>              }
> >>          }
> >>      }
> >>
> >>
> >> On Mon, Mar 24, 2014 at 12:25 PM, Allan, Brad (Bracknell) <
> >> Brad.Allan@fiserv.com> wrote:
> >>
> >>> Thanks Paul for responding.
> >>> The background timer is not an option for the application.
> >>>
> >>> The 'cache' is only administering a single index reader for each index
> >>> in my system, and the cache does perform a reopen before returning the
> >>> reader, so I am using a technique similar to what you are describing.
> >>>
> >>> It seems to me that I can call IncRef() and DecRef(). For the latter
> >>> I'll have to add something to the cache that allows 'getters' to let
> >>> the cache know they are done.
> >>>
> >>> -----Original Message-----
> >>> From: Paul Irwin [mailto:pirwin@feature23.com]
> >>> Sent: 24 March 2014 16:07
> >>> To: user@lucenenet.apache.org
> >>> Cc: lucene-net-user@lucene.apache.org
> >>> Subject: Re: IndexReader disposal causing unexpected
> >>> 'AlreadyClosedException'
> >>>
> >>> I'd recommend trying to keep the IndexReader instance as a singleton
> >>> static variable. It's safe to use across multiple threads, so there's
> >>> no benefit to disposing and creating new instances. Just call
> >>> .Reopen() on it when the data changes.
> >>>
> >>> What's happening here is the "reader" variable is being returned from
> >>> the first call to the method, then on the second call, you're getting
> >>> the same instance back from cache and then calling dispose on it,
> >>> while the "reader1" variable still has a reference to that instance,
> >>> so when you do anything on "reader1" it's already disposed. So yeah, I
> >>> would recommend keeping an instance open singleton, never calling
> >>> Dispose on it, and calling Reopen when needed.
> >>>
> >>> If you need to naively call Reopen periodically because you don't know
> >>> when the index changes, you could do that on a background timer tick
> >>> if you can accept slightly stale data.
> >>>
> >>> Paul
> >>>
> >>>
> >>> On Mon, Mar 24, 2014 at 11:55 AM, Allan, Brad (Bracknell) <
> >>> Brad.Allan@fiserv.com> wrote:
> >>>
> >>>> I've create some pseudo code to try and describe what I'm seeing.
> >>>> This 'problem' can also be replicated without multiple threads
> >>>> (sessions as I describe them below) if the order of operations is as
> >> described below.
> >>>> How am I misunderstanding the Lucene docs (around Reopen() method)?
> >>>> I have looked into the source code and the Dispose does appear to
> >>>> not want to close the reader if there are existing refs. What do I
> >>>> need to do in my code to ensure the following scenario does not occur?
> >>>>
> >>>> A solution I'm mulling over is to keep an additional reference count
> >>>> for the readers I dish out from 'MyCache' and only Dispose the
> >>>> reader when MyCache knows there are no existing refs to the 'old'
> >> reader.
> >>>> Wise words appreciated. Thanks.
> >>>>
> >>>> // Session 1
> >>>> var reader1 = GetIndexReaderFromMyCache("MyTestIndex");
> >>>>
> >>>> // Session 2
> >>>> ....some changes are made to 'MyTestIndex'.....
> >>>>
> >>>> // Session 3
> >>>> var reader3 = GetIndexReaderFromMyCache("MyTestIndex");
> >>>>
> >>>> // Session 1
> >>>> ....perform a search with reader1....
> >>>> BOOM! get 'AlreadyClosedException'
> >>>>
> >>>> Where:
> >>>> -----------
> >>>> IndexReader GetIndexReaderFromMyCache(string indexName) {
> >>>>       lock(lockObject) {
> >>>>             IndexReader reader = MyCache.Get(indexName);
> >>>>             IndexReader newReader = reader.Reopen();
> >>>>             if (newReader != reader) {
> >>>>                  reader.Dispose();
> >>>>             }
> >>>>             reader = newReader;
> >>>>             MyCache.Update(indexName, reader);
> >>>>             return reader;
> >>>>       }
> >>>> }
> >>>>
> >>>>
> >>>> ________________________________
> >>>>
> >>>> CheckFree Solutions Limited (trading as Fiserv) Registered Office:
> >>>> Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES
> >>>> Registered in England: No. 2694333
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> Paul Irwin
> >>> Lead Software Engineer
> >>> feature[23]
> >>>
> >>> Email: pirwin@feature23.com
> >>> Cell: 863-698-9294
> >>>
> >>> ________________________________
> >>>
> >>> CheckFree Solutions Limited (trading as Fiserv) Registered Office:
> >>> Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES
> >>> Registered in England: No. 2694333
> >>>
> >>
> >>
> >> --
> >>
> >> Paul Irwin
> >> Lead Software Engineer
> >> feature[23]
> >>
> >> Email: pirwin@feature23.com
> >> Cell: 863-698-9294
> >>
> >> ________________________________
> >>
> >> CheckFree Solutions Limited (trading as Fiserv)
> >> Registered Office: Eversheds House, 70 Great Bridgewater Street,
> >> Manchester, M15 ES
> >> Registered in England: No. 2694333
> >>
> >
> >
> 

-- 
Sent from a real computer


Mime
View raw message