lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <luc...@mikemccandless.com>
Subject RE: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
Date Thu, 22 Mar 2007 20:07:58 GMT

"Steven Parkes" <steven_parkes@esseff.org> wrote:
>   * Merge policy has problems when you "flush by RAM" (this is true
>     even before my patch).  Not sure how to fix yet.
> 
> Do you mean where one would be trying to use RAM usage to determine when
> to do a flush? 

Right, if you have your indexer measure RAM usage
(writer.ramSizeInByts()) after each added doc and flush whenever that
crosses X MB then depending on the value of maxBufferedDocs, you may
over-merge.

EG if you set maxBufferedDocs to say 10000 but then it turns out based
on RAM usage you actually flush every 300 docs then the merge policy
will incorrectly merge a level 1 segment (with 3000 docs) in with the
level 0 segments (with 300 docs).  This is because the merge policy
looks at the current value of maxBufferedDocs to compute the levels
so a 3000 doc segment and a 300 doc segment all look like "level 0".

(I'm doing this to try to do apples to apples performance comparison
of current Lucene trunk vs my patch and "flushing by RAM" seems like
the fair comparison but then I have to carefully pick maxBufferedDocs
to make sure I don't hit this).

I will open a separate issue for this.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message