lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 34930] - IndexWriter.maybeMergeSegments() takes lots of CPU resources
Date Mon, 16 May 2005 17:17:24 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34930>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=34930





------- Additional Comments From cutting@apache.org  2005-05-16 19:17 -------
Your benchmark might run faster if you set maxBufferedDocs smaller.  Also, it
doesn't look like you're including the cost of closing the IndexWriter in your
benchmark statistics.  You should, as, with such a large buffer, you've delayed
much of the work to that point.

The bottleneck you're hitting is that maybeMergeDocs sums the size of the
buffered indexes each time to decide whether to merge.  When you have thousands
buffered, this dominates.

To optimize this case (small docs, large maxBufferedDocs) we could keep count of
the number of documents buffered by adding a bufferedDocCount field. 
addDocument could increment this, mergeSegments could decrement it, and
maybeMergeSegments could check it with something like:

if (targetMergeDocs == minMergeDocs)  {
  mergeDocs = bufferedDocCount;
} else {
  while (--minSegment >= 0) {
  ...
  }
}

Does that make sense?

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message