lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Memory use during merges (OOM)
Date Thu, 16 Dec 2010 21:22:09 GMT
On Thu, Dec 16, 2010 at 4:03 PM, Burton-West, Tom <tburtonw@umich.edu> wrote:
>>>Your setting isn't being applied to the reader IW uses during
>>>merging... its only for readers Solr opens from directories
>>>explicitly.
>>>I think you should open a jira issue!
>
> Do I understand correctly that this setting in theory could be applied to the reader
IW uses during merging but is not currently being applied?

yes, i'm not really sure (especially given the "name=") if you can/or
it was planned to have multiple IR factories in solr, e.g. a separate
one for spellchecking.
so i'm not sure if we should (hackishly) steal this parameter from the
IR factory (it is common to all IRFactories, not just
StandardIRFactory) and apply it to to IW..

but we could at least expose the divisor param separately to the IW
config so you have some way of setting it.

>
> <indexReaderFactory name="IndexReaderFactory" class="org.apache.solr.core.StandardIndexReaderFactory">
>    <int name="termInfosIndexDivisor">8</int>
>  </indexReaderFactory >
>
> I understand the tradeoffs for doing this during searching, but not the trade-offs for
doing this during merging.  Is the use during merging the similar to the use during searching?
>
>  i.e. Some process has to look up data for a particular term as opposed to having to
iterate through all the terms?
>  (Haven't yet dug into the merging/indexing code).

it needs it for applying deletes...

as a workaround (if you are reindexing), maybe instead of using the
Terms Index Divisor=8 you could set the Terms Index Interval = 1024 (8
* 128) ?

this will solve your merging problem, and have the same perf
characteristics of divisor=8, except you cant "go back down" like you
can with the divisor without reindexing with a smaller interval...

if you've already tested that performance with the divisor of 8 is
acceptable, or in your case maybe necessary!, it sort of makes sense
to 'bake it in' by setting your divisor back to 1 and your interval =
1024 instead...

Mime
View raw message