lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Solanki <nitinml...@gmail.com>
Subject Re: Whole RAM consumed while Indexing.
Date Fri, 20 Mar 2015 06:55:08 GMT
Hi Erick,
           I read mergeFactor Policy for indexing. By default, mergerFactor
is 10. As said in document,

High value merge factor (e.g., 25):

   - Pro: Generally improves indexing speed
   - Con: Less frequent merges, resulting in a collection with more index
   files which may slow searching

Low value merge factor (e.g., 2):

   - Pro: Smaller number of index files, which speeds up searching.
   - Con: More segment merges slow down indexing.

So, My main purpose is **searching**. Searching must be fast. Therefore, If
I set the value of **mergeFactor = 2 ** then indexing will be slow but
searching may fast right.

Once Again, I will tell. I am indexing(Total data size - 28GB)  20000
document at a time that encounter commits after 15 seconds(hard commit) and
10 mins(soft commit).

Is searching be fast, if I set **mergeFactor = 2 ** and what should be the
value for ramBufferSizeMB, maxBufferedDocs, maxIndexingThreads?

Right now, All value are set by default..

On Fri, Mar 20, 2015 at 11:42 AM, Nitin Solanki <nitinmlvya@gmail.com>
wrote:

>
>
> On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> That or even hard commit to 60 seconds. It's strictly a matter of how
>> often
>> you want to close old segments and open new ones.
>>
>> On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki <nitinmlvya@gmail.com>
>> wrote:
>> > Hi Erick..
>> >               I read your Article. Really nice...
>> > Inside that you said that for bulk indexing. Set soft commit = 10 mins
>> and
>> > hard commit = 15sec. Is it also okay for my scenario?
>> >
>> > On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson <
>> erickerickson@gmail.com>
>> > wrote:
>> >
>> >> bq: As you said, do commits after 60000 seconds
>> >>
>> >> No, No, No. I'm NOT saying 60000 seconds! That time is in
>> _milliseconds_
>> >> as Shawn said. So setting it to 60000 is every minute.
>> >>
>> >> From solrconfig.xml, conveniently located immediately above the
>> >> <autoCommit> tag:
>> >>
>> >> maxTime - Maximum amount of time in ms that is allowed to pass since a
>> >> document was added before automatically triggering a new commit.
>> >>
>> >> Also, a lot of answers to soft and hard commits is here as I pointed
>> >> out before, did you read it?
>> >>
>> >>
>> >>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
>> >> <arafalov@gmail.com> wrote:
>> >> > Probably merged somewhat differently with some terms indexes
>> repeating
>> >> > between segments. Check the number of segments in data directory.And
>> >> > do search for *:* and make sure both do have the same document
>> counts.
>> >> >
>> >> > Also, In all these discussions, you still haven't answered about how
>> >> > fast after indexing you want to _search_? Because, if you are not
>> >> > actually searching while committing, you could even index on a
>> >> > completely separate server (e.g. a faster one) and swap (or alias)
>> >> > index in afterwards. Unless, of course, I missed it, it's a lot of
>> >> > emails in a very short window of time.
>> >> >
>> >> > Regards,
>> >> >    Alex.
>> >> >
>> >> > ----
>> >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> >> > http://www.solr-start.com/
>> >> >
>> >> >
>> >> > On 18 March 2015 at 12:09, Nitin Solanki <nitinmlvya@gmail.com>
>> wrote:
>> >> >> When I kept my configuration to 300 for soft commit and 3000 for
>> hard
>> >> >> commit and indexed some amount of data, I got the data size of
the
>> whole
>> >> >> index to be 6GB after completing the indexing.
>> >> >>
>> >> >> When I changed the configuration to 60000 for soft commit and 60000
>> for
>> >> >> hard commit and indexed same data then I got the data size of the
>> whole
>> >> >> index to be 5GB after completing the indexing.
>> >> >>
>> >> >> But the number of documents in the both scenario were same. I am
>> >> wondering
>> >> >> how that can be possible?
>> >> >>
>> >> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <
>> nitinmlvya@gmail.com>
>> >> wrote:
>> >> >>
>> >> >>> Hi Erick,
>> >> >>>              I am just saying. I want to be sure on commits
>> >> difference..
>> >> >>> What if I do frequent commits or not? And why I am saying that
I
>> need
>> >> to
>> >> >>> commit things so very quickly because I have to index 28GB
of data
>> >> which
>> >> >>> takes 7-8 hours(frequent commits).
>> >> >>> As you said, do commits after 60000 seconds then it will be
more
>> >> expensive.
>> >> >>> If I don't encounter with **"overlapping searchers" warning
>> messages**
>> >> >>> then I feel it seems to be okay. Is it?
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
>> >> erickerickson@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Don't do it. Really, why do you want to do this? This seems
like
>> >> >>>> an "XY" problem, you haven't explained why you need to
commit
>> >> >>>> things so very quickly.
>> >> >>>>
>> >> >>>> I suspect you haven't tried _searching_ while committing
at such
>> >> >>>> a rate, and you might as well turn all your top-level caches
off
>> >> >>>> in solrconfig.xml since they won't be useful at all.
>> >> >>>>
>> >> >>>> Best,
>> >> >>>> Erick
>> >> >>>>
>> >> >>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <
>> nitinmlvya@gmail.com>
>> >> >>>> wrote:
>> >> >>>> > Hi,
>> >> >>>> >        If I do very very fast indexing(softcommit
= 300 and
>> >> hardcommit =
>> >> >>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit
=
>> 60000)
>> >> as
>> >> >>>> you
>> >> >>>> > both said. Will fast indexing fail to index some data?
>> >> >>>> > Any suggestion on this ?
>> >> >>>> >
>> >> >>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar
<
>> >> >>>> > andyetitmoves@gmail.com> wrote:
>> >> >>>> >
>> >> >>>> >> Yes, and doing so is painful and takes lots of
people and
>> hardware
>> >> >>>> >> resources to get there for large amounts of data
and queries :)
>> >> >>>> >>
>> >> >>>> >> As Erick says, work backwards from 60s and first
establish how
>> >> high the
>> >> >>>> >> commit interval can be to satisfy your use case..
>> >> >>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <
>> erickerickson@gmail.com>
>> >> >>>> wrote:
>> >> >>>> >>
>> >> >>>> >> > First start by lengthening your soft and
hard commit
>> intervals
>> >> >>>> >> > substantially. Start with 60000 and work
backwards I'd say.
>> >> >>>> >> >
>> >> >>>> >> > Ramkumar has tuned the heck out of his installation
to get
>> the
>> >> commit
>> >> >>>> >> > intervals to be that short ;).
>> >> >>>> >> >
>> >> >>>> >> > I'm betting that you'll see your RAM usage
go way down, but
>> >> that' s a
>> >> >>>> >> > guess until you test.
>> >> >>>> >> >
>> >> >>>> >> > Best,
>> >> >>>> >> > Erick
>> >> >>>> >> >
>> >> >>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki
<
>> >> >>>> nitinmlvya@gmail.com>
>> >> >>>> >> > wrote:
>> >> >>>> >> > > Hi Erick,
>> >> >>>> >> > >             You are saying correct.
Something,
>> **"overlapping
>> >> >>>> >> searchers"
>> >> >>>> >> > > warning messages** are coming in logs.
>> >> >>>> >> > > **numDocs numbers** are changing when
documents are adding
>> at
>> >> the
>> >> >>>> time
>> >> >>>> >> of
>> >> >>>> >> > > indexing.
>> >> >>>> >> > > Any help?
>> >> >>>> >> > >
>> >> >>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick
Erickson <
>> >> >>>> >> > erickerickson@gmail.com>
>> >> >>>> >> > > wrote:
>> >> >>>> >> > >
>> >> >>>> >> > >> First, the soft commit interval
is very short. Very, very,
>> >> very,
>> >> >>>> very
>> >> >>>> >> > >> short. 300ms is
>> >> >>>> >> > >> just short of insane unless it's
a typo ;).
>> >> >>>> >> > >>
>> >> >>>> >> > >> Here's a long background:
>> >> >>>> >> > >>
>> >> >>>> >> > >>
>> >> >>>> >> >
>> >> >>>> >>
>> >> >>>>
>> >>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >> >>>> >> > >>
>> >> >>>> >> > >> But the short form is that you're
opening searchers every
>> 300
>> >> ms.
>> >> >>>> The
>> >> >>>> >> > >> hard commit is better,
>> >> >>>> >> > >> but every 3 seconds is still far
too short IMO. I'd start
>> with
>> >> >>>> soft
>> >> >>>> >> > >> commits of 60000 and hard
>> >> >>>> >> > >> commits of 60000 (60 seconds), meaning
that you're going
>> to
>> >> have
>> >> >>>> to
>> >> >>>> >> > >> wait 1 minute for
>> >> >>>> >> > >> docs to show up unless you explicitly
commit.
>> >> >>>> >> > >>
>> >> >>>> >> > >> You're throwing away all the caches
configured in
>> >> solrconfig.xml
>> >> >>>> more
>> >> >>>> >> > >> than 3 times a second,
>> >> >>>> >> > >> executing autowarming, etc, etc,
etc....
>> >> >>>> >> > >>
>> >> >>>> >> > >> Changing these to longer intervals
might cure the problem,
>> >> but if
>> >> >>>> not
>> >> >>>> >> > >> then, as Hoss would
>> >> >>>> >> > >> say, "details matter". I suspect
you're also seeing
>> >> "overlapping
>> >> >>>> >> > >> searchers" warning messages
>> >> >>>> >> > >> in your log, and it;s _possible_
that what's happening is
>> that
>> >> >>>> you're
>> >> >>>> >> > >> just exceeding the
>> >> >>>> >> > >> max warming searchers and never
opening a new searcher
>> with
>> >> the
>> >> >>>> >> > >> newly-indexed documents.
>> >> >>>> >> > >> But that's a total shot in the dark.
>> >> >>>> >> > >>
>> >> >>>> >> > >> How are you looking for docs (and
not finding them)? Does
>> the
>> >> >>>> numDocs
>> >> >>>> >> > >> number in
>> >> >>>> >> > >> the solr admin screen change?
>> >> >>>> >> > >>
>> >> >>>> >> > >>
>> >> >>>> >> > >> Best,
>> >> >>>> >> > >> Erick
>> >> >>>> >> > >>
>> >> >>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM,
Nitin Solanki <
>> >> >>>> nitinmlvya@gmail.com
>> >> >>>> >> >
>> >> >>>> >> > >> wrote:
>> >> >>>> >> > >> > Hi Alexandre,
>> >> >>>> >> > >> >
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > *Hard Commit* is :
>> >> >>>> >> > >> >
>> >> >>>> >> > >> >      <autoCommit>
>> >> >>>> >> > >> >
>> <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>> >> >>>> >> > >> >        <openSearcher>false</openSearcher>
>> >> >>>> >> > >> >      </autoCommit>
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > *Soft Commit* is :
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > <autoSoftCommit>
>> >> >>>> >> > >> >
>>  <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>> >> >>>> >> > >> > </autoSoftCommit>
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > And I am committing 20000 documents
each time.
>> >> >>>> >> > >> > Is it good config for committing?
>> >> >>>> >> > >> > Or I am good something wrong
?
>> >> >>>> >> > >> >
>> >> >>>> >> > >> >
>> >> >>>> >> > >> > On Fri, Mar 13, 2015 at 8:52
AM, Alexandre Rafalovitch <
>> >> >>>> >> > >> arafalov@gmail.com>
>> >> >>>> >> > >> > wrote:
>> >> >>>> >> > >> >
>> >> >>>> >> > >> >> What's your commit strategy?
Explicit commits? Soft
>> >> >>>> commits/hard
>> >> >>>> >> > >> >> commits (in solrconfig.xml)?
>> >> >>>> >> > >> >>
>> >> >>>> >> > >> >> Regards,
>> >> >>>> >> > >> >>    Alex.
>> >> >>>> >> > >> >> ----
>> >> >>>> >> > >> >> Solr Analyzers, Tokenizers,
Filters, URPs and even a
>> >> >>>> newsletter:
>> >> >>>> >> > >> >> http://www.solr-start.com/
>> >> >>>> >> > >> >>
>> >> >>>> >> > >> >>
>> >> >>>> >> > >> >> On 12 March 2015 at 23:19,
Nitin Solanki <
>> >> nitinmlvya@gmail.com
>> >> >>>> >
>> >> >>>> >> > wrote:
>> >> >>>> >> > >> >> > Hello,
>> >> >>>> >> > >> >> >           I have written
a python script to do 20000
>> >> >>>> documents
>> >> >>>> >> > >> indexing
>> >> >>>> >> > >> >> > each time on Solr.
I have 28 GB RAM with 8 CPU.
>> >> >>>> >> > >> >> > When I started indexing,
at that time 15 GB RAM was
>> >> freed.
>> >> >>>> While
>> >> >>>> >> > >> >> indexing,
>> >> >>>> >> > >> >> > all RAM is consumed
but **not** a single document is
>> >> >>>> indexed. Why
>> >> >>>> >> > so?
>> >> >>>> >> > >> >> > And it through *HTTPError:
HTTP Error 503: Service
>> >> >>>> Unavailable*
>> >> >>>> >> in
>> >> >>>> >> > >> python
>> >> >>>> >> > >> >> > script.
>> >> >>>> >> > >> >> > I think it is due
to heavy load on Zookeeper by
>> which all
>> >> >>>> nodes
>> >> >>>> >> > went
>> >> >>>> >> > >> >> down.
>> >> >>>> >> > >> >> > I am not sure about
that. Any help please..
>> >> >>>> >> > >> >> > Or anything else is
happening..
>> >> >>>> >> > >> >> > And how to overcome
this issue.
>> >> >>>> >> > >> >> > Please assist me towards
right path.
>> >> >>>> >> > >> >> > Thanks..
>> >> >>>> >> > >> >> >
>> >> >>>> >> > >> >> > Warm Regards,
>> >> >>>> >> > >> >> > Nitin Solanki
>> >> >>>> >> > >> >>
>> >> >>>> >> > >>
>> >> >>>> >> >
>> >> >>>> >>
>> >> >>>>
>> >> >>>
>> >> >>>
>> >>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message