Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of wangjundot@gmail.com designates
 209.85.160.48 as permitted sender)
MIME-Version: 1.0
Date: Fri, 19 Oct 2012 12:53:30 +0800
Message-ID: 
 <CAAfq8iSnMy+npKGYmZxma0ha6rPtV=46H2fzaS+zqs8SC72VtQ@mail.gmail.com>
Subject: Solr 4.0 segment flush times has bigger difference between tow
 machines
From: Jun Wang <wangjundot@gmail.com>
To: solr-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=047d7b2e119dffbfb104cc62482e

--047d7b2e119dffbfb104cc62482e
Content-Type: text/plain; charset=ISO-8859-1

Hi

I have 2 machine for a collection, and it's using DIH to import data, DIH
is trigger via url request at one machine, let's call it A, and A will
forward some index to machine B. Recently I have found that segment flush
happened more in machine B. here is part of INFOSTREAM.txt.

Machine A:
----------------------------
DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as segment
_4r3 numDocs=71616
DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0 deleted
docs
DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no
vectors; no norms; no docValues; prox; freqs
DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]:
flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm,
_4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq]
DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40
D

Machine B
----------------------------------
DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings
as segment _zi0 numDocs=4302
DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment has
0 deleted docs
DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment has
no vectors; no norms; no docValues; prox; freqs
DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]:
flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt,
_zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip]
DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed
codec=Lucene40
D

I have found that flush occured  when number of doc in RAM reached
70000~9000 in machine A, but the number in machine B is very different,
almost is 4000.  It seem that every doc in buffer used more RAM in machine
B then machine A, that result in more flush . Does any one know why this
happened?

My conf is here.

<ramBufferSizeMB>64</ramBufferSizeMB><maxBufferedDocs>100000</maxBufferedDocs>


-- 
from Jun Wang

--047d7b2e119dffbfb104cc62482e--