lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S G <sg.online.em...@gmail.com>
Subject Re: JVM GC Issue
Date Sun, 03 Dec 2017 01:59:33 GMT
I am a bit curious on the docValues implementation.
I understand that docValues do not use JVM memory and
they make use of OS cache - that is why they are more performant.

But to return any response from the docValues, the values in the
docValues' column-oriented-structures would need to be brought
into the JVM's memory. And that will then increase the pressure
on the JVM's memory anyways. So how do docValues actually
help from memory perspective?

Thanks
SG


On Sat, Dec 2, 2017 at 12:39 AM, Dominique Bejean <dominique.bejean@eolya.fr
> wrote:

> Hi, Thank you for the explanations about faceting. I was thinking the hit
> count had a biggest impact on facet memory lifecycle. Regardless the hit
> cout there is a query peak at the time the issue occurs. This is relative
> in regard of what Solr is supposed be able to handle, but this should be
> sufficient to explain GC activity growing up. 198 10:07 208 10:08 267 10:09
> 285 10:10 244 10:11 286 10:12 277 10:13 252 10:14 183 10:15 302 10:16 299
> 10:17 273 10:18 348 10:19 468 10:20 496 10:21 673 10:22 496 10:23 101 10:24
> At the time the issue occurs, we see the CPU activity growing up to very
> high. May be there is a lack of CPU. So, I will suggest all actions that
> will remove pressure on heap memory.
>
>
>    - enable docValues
>    - divide cache size per 2 in order go back to Solr default
>    - refine the fl parameter as I know it can optimized
>
> Concerning phonetic filter, anyway it will be removed as a large number of
> results are really irrelevant. Regads. Dominique
>
>
> Le sam. 2 déc. 2017 à 04:25, Erick Erickson <erickerickson@gmail.com> a
> écrit :
>
> > Doninique:
> >
> > Actually, the memory requirements shouldn't really go up as the number
> > of hits increases. The general algorithm is (say rows=10)
> > Calcluate the score of each doc
> > If the score is zero, ignore
> > If the score is > the minimum in my current top 10, replace the lowest
> > scoring doc in my current top 10 with the new doc (a PriorityQueue
> > last I knew).
> > else discard the doc.
> >
> > When all docs have been scored, assemble the return from the top 10
> > (or whatever rows is set to).
> >
> > The key here is that most of the Solr index is kept in
> > MMapDirecotry/OS space, see Uwe's excellent blog here:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
> > In terms of _searching_, very little of the Lucene index structures
> > are kept in memory.
> >
> > That said, faceting plays a bit loose with the rules. If you have
> > docValues set to true, most of the memory structures are in the OS
> > memory space, not the JVM. If you have docValues set to false, then
> > the "uninverted" structure is built in the JVM heap space.
> >
> > Additionally, the JVM requirements are sensitive to the number of
> > unique values in field being faceted on. For instance, let's say you
> > faceted by a date field with just facet.field=some_date_field. A
> > bucket would have to be allocated to hold the counts for each and
> > every unique date field, i.e. one for each millisecond in your search,
> > which might be something you're seeing. Conceptually this is just an
> > array[uniqueValues] of ints (longs? I'm not sure). This should be
> > relatively easily testable by omitting the facets while measuring.
> >
> > Where the number of rows _does_ make a difference is in the return
> > packet. Say I have rows=10. In that case I create a single return
> > packet with all 10 docs "fl" field. If rows = 10,000 then that return
> > packet is obviously 1,000 times as large and must be assembled in
> > memory.
> >
> > I rather doubt the phonetic filter is to blame. But you can test this
> > by just omitting the field containing the phonetic filter in the
> > search query. I've certainly been wrong before.....
> >
> > Best,
> > Erick
> >
> > On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean
> > <dominique.bejean@eolya.fr> wrote:
> > > Hi,
> > >
> > >
> > > Thank you both for your responses.
> > >
> > >
> > > I just have solr log for the very last period of the CG log.
> > >
> > >
> > > Grep command allows me to count queries per minute with hits > 1000 or
> >
> > > 10000 and so with the biggest impact on memory and cpu during faceting
> > >
> > >
> > >> 1000
> > >
> > >      59 11:13
> > >
> > >      45 11:14
> > >
> > >      36 11:15
> > >
> > >      45 11:16
> > >
> > >      59 11:17
> > >
> > >      40 11:18
> > >
> > >      95 11:19
> > >
> > >     123 11:20
> > >
> > >     137 11:21
> > >
> > >     123 11:22
> > >
> > >      86 11:23
> > >
> > >      26 11:24
> > >
> > >      19 11:25
> > >
> > >      17 11:26
> > >
> > >
> > >> 10000
> > >
> > >      55 11:19
> > >
> > >      78 11:20
> > >
> > >      48 11:21
> > >
> > >     134 11:22
> > >
> > >      93 11:23
> > >
> > >      10 11:24
> > >
> > >
> > > So we see that at the time GC start become nuts, large result set count
> > > increase.
> > >
> > >
> > > The query field include phonetic filter and results are really not
> > relevant
> > > due to this. I will suggest to :
> > >
> > >
> > > 1/ remove the phonetic filter in order to have less irrelevant results
> > and
> > > so get smaller result set
> > >
> > > 2/ enable docValues on field use for faceting
> > >
> > >
> > > I expect decrease GC requirements and stabilize GC.
> > >
> > >
> > > Regards
> > >
> > >
> > > Dominique
> > >
> > >
> > >
> > >
> > >
> > > Le ven. 1 déc. 2017 à 18:17, Erick Erickson <erickerickson@gmail.com>
> a
> > > écrit :
> > >
> > >> Your autowarm counts are rather high, bit as Toke says this doesn't
> > >> seem outrageous.
> > >>
> > >> I have seen situations where Solr is running close to the limits of
> > >> its heap and GC only reclaims a tiny bit of memory each time, when you
> > >> say "full GC with no memory
> > >> reclaimed" is that really no memory _at all_? Or "almost no memory"?
> > >> This situation can be alleviated by allocating more memory to the JVM
> > >> .
> > >>
> > >> Your JVM pressure would certainly be reduced by enabling docValues on
> > >> any field you sort,facet or group on. That would require a full
> > >> reindex of course. Note that this makes your index on disk bigger, but
> > >> reduces JVM pressure by roughly the same amount so it's a win in this
> > >> situation.
> > >>
> > >> Have you attached a memory profiler to the running Solr instance? I'd
> > >> be curious where the memory is being allocated.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen <toes@kb.dk> wrote:
> > >> > Dominique Bejean <dominique.bejean@eolya.fr> wrote:
> > >> >> We are encountering issue with GC.
> > >> >
> > >> >> Randomly nearly once a day there are consecutive full GC with
no
> > memory
> > >> >> reclaimed.
> > >> >
> > >> > [... 1.2M docs, Xmx 6GB ...]
> > >> >
> > >> >> Gceasy suggest to increase heap size, but I do not agree
> > >> >
> > >> > It does seem strange, with your apparently modest index & workload.
> > >> Nothing you say sounds problematic to me and you have covered the
> usual
> > >> culprits overlapping searchers, faceting and filterCache.
> > >> >
> > >> > Is it possible for you to share the solr.log around the two times
> that
> > >> memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01
> 08:00-12:00.
> > >> >
> > >> > If you cannot share, please check if you have excessive traffic
> around
> > >> that time or if there is a lot of UnInverting going on (triggered by
> > >> faceting on non.DocValues String fields). I know your post implies
> that
> > you
> > >> have already done so, so this is more of a sanity check.
> > >> >
> > >> >
> > >> > - Toke Eskildsen
> > >>
> > > --
> > > Dominique Béjean
> > > 06 08 46 12 43
> >
> --
> Dominique Béjean
> 06 08 46 12 43
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message