lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: WeakIdentityMap high memory usage
Date Thu, 08 Aug 2013 13:09:49 GMT
Hi Mike,

I don't think disabling by default is a good idea. It is not only 64 bit wasted address space
(which is not a problem at all, you are right), but the JVM also "sits" on those files:
- On windows they cannot be deleted (not even on Java 7 w/ Lucene trunk, where you can now
delete them if not mmapped) - this may cause major pain...!
- On posix the disk space is locked, so the inode can only be freed when GC was freeing the
mapping

Uwe 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Thursday, August 08, 2013 2:18 PM
> To: Lucene Users
> Subject: Re: WeakIdentityMap high memory usage
> 
> Thanks for bringing closure.
> 
> Note that you should still run a tight ship, ie don't give excess heap to
> Lucene, and instead let the OS take up the slack of any spare RAM for IO
> caching.  Especially with unmap disabled, the JVM will now only unmap once
> a map is GC'd, so the larger your heap the longer these unused maps are
> held open.
> 
> Maybe we should disable unmap by default; I don't see what value it brings
> for 64 bit envs.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Wed, Aug 7, 2013 at 9:34 PM, Denis Bazhenov <dotsid@gmail.com>
> wrote:
> > Uwe, Michael, thank you very much for your help. We have deployed one
> of the nodes in our system and tomorrow I'll have more information on that,
> but it seems that setUseUnmap(false) trick did the job. RT drops significantly
> comparing to 3.6.0 version. We have about 100 rps per search-node and
> commit interval about 1 minute, so switching off the unmap seems like a
> good idea.
> >
> > There is one more question. As far as I understand, this map is like a fuse in
> situation where clients continue to use IndexReader after it is already closed.
> So if the code is correctly closing IndexReaders (only after all clients have
> finished using it), there is no need to use this sort of weak map hashing. Did I
> get it right?
> >
> > On Aug 8, 2013, at 4:31 AM, "Uwe Schindler" <uwe@thetaphi.de> wrote:
> >
> >> Hi Denis,
> >>
> >> I assume you are using Lucene 3.6.0, because in Lucene 3.6.1 the tracking
> of buffers using weak references is also done (although you cannot switch it
> off, unfortunately).
> >>
> >> I can confirm what Mike says: Its all weak references and the overhead is
> maybe large, but it gets freed when memory gets low. In general its in most
> cases better to not allocate too much heap space for Lucene as this makes
> those maps larger and GC gets stressed. Only use as much memory so no
> OOM occurs and instead free al memory for the file system cache (so it has
> less paging). In that case, GC will clean up the concurrent maps faster.
> >>
> >> In gernal: If you have an large index that changes seldom, but your query
> rate is very hight (like 200 queries per second), switch unmapping off (works
> since Lucene 4.2, see changelog for LUCENE-4740 - unfortunately the issue
> itself was closed for 4.4, 4.2 would be correct). In that case it's not needed to
> take care of unmapping and as index reopen rate is low, this does not waste
> resources.
> >>
> >> But if your index changes often, there is no way around unmapping - or
> use NIOFSDir with NRTCachingDirectory for the optimization of near real time
> search with highly changing indexes!
> >>
> >> Finally: The only way to fix this would be to make all codec structures like
> TermsEnum or DocsEnum, but also Scorer/DocIdSet/... implement Closeable.
> When you are done with Scorer you have to close it and the underlying
> cloned indexinput would be closed, too. In that case, the cloned IndexInput
> would be refcounted and unmapped when the last clone is closed. This is a
> larger change and might be an idea for Lucene 5.0 as "optimization". It would
> be a backwards break because all codecs and all queries would need to close
> correctly, but with our test frameworak and MockDirWrapper (and other
> MockFooBarWrappers) we could track this so all resources are closed.
> >> We had TermEnum.close() up to Lucene 3.x, but it was dropped in 4.0
> because it was never working in 3.x (nobody ever called close() on
> TermEnum or TermDocs instances.... :( ). With our new test framework this
> could be tracked now... So maybe worth a try?
> >>
> >> Uwe
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >>> -----Original Message-----
> >>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> >>> Sent: Wednesday, August 07, 2013 3:45 PM
> >>> To: Lucene Users
> >>> Subject: Re: WeakIdentityMap high memory usage
> >>>
> >>> This map is used to track all cloned open files, which can be a very
> >>> large number over time (each search will create maybe 3 of them).
> >>>
> >>> This is done as a "best effort" to prevent SEGV (JVM dies) if you
> >>> accidentally try to use an IndexReader after it was closed, while using
> MMapDirectory.
> >>>
> >>> However, it's a weak map, which means when HEAP is tight GC should
> >>> drop it.
> >>>
> >>> So, this should not cause a real problem in "real life", even though
> >>> it looks scary when you look at its RAM usage under a profiler.
> >>>
> >>> If somehow it's causing "real life" problems, please report back!
> >>> But a simple workaround is to call MMapDirectory.setUseUnmap(false)
> >>> to turn off this tracking; this means you rely on GC to (eventually)
> unmap.
> >>>
> >>> Mike McCandless
> >>>
> >>> http://blog.mikemccandless.com
> >>>
> >>>
> >>> On Wed, Aug 7, 2013 at 2:45 AM, Denis Bazhenov
> >>> <bazhenov@farpost.com>
> >>> wrote:
> >>>> We have upgraded from Lucene 3.6 to 4.4.On the production we faced
> >>>> high
> >>> minor GC time. Heap dump showed that one of the biggest objects by
> >>> size is
> >>> org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference.
> About
> >>> 11 million instances with about 377 megabytes of memory in total (this is
> not even retained size). Here is screenshot of the JProfiler output:
> >>>
> https://dl.dropboxusercontent.com/u/16254496/Screen%20Shot%202013-
> >>> 08-07%20at%205.35.22%20PM.png.
> >>>>
> >>>> The keys of the map are MMapIndexInput. What this map is for and
> >>>> how
> >>> can I reduce it memory usage?
> >>>> ---
> >>>> Denis Bazhenov <bazhenov@farpost.com> FarPost.
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> > ---
> > Denis Bazhenov <dotsid@gmail.com>
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message