lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <luc...@mikemccandless.com>
Subject RE: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter
Date Sun, 25 Mar 2007 18:17:08 GMT

"Steven Parkes" <steven_parkes@esseff.org> wrote:
> Yes, I'll separate out issues related to the basic refactor before
> submitting a candidate patch. I actually thought it might be helpful to
> keep it in the rough version to see context. But I can do that at any
> time ...

OK, that makes sense to leave it as one patch until things get closer
to ready.

> With the factored merge policy, it's easy enough to create a merge
> policy on size parallel to the one on docs. Hmmm ... suppose one could
> use derivation of one from the other or from a common base given the
> appropriate factoring of "size" in the algorithm.

Factoring out just the determination of "size" would be nice.

Given how serious LUCENE-845 is (over-merging when flushing by RAM) I
think we should in fact switch the default merge policy to by "by
size" rather than "by doc count"?  (But keep the "by doc count"
version available in case people want to switch back?).

Especially with LUCENE-843 (where I plan to change writer to flush by
RAM usage by default) we need the default merge policy to not have
this bug.

> I really want to do some larger tests of this. I've downloaded Wikipedia
> and plan to add support for it in the benchmarker stuff (if anyone else
> is doing this, can you stop me now?) I figure I'll try it on my main
> machine and my laptop. My main machine has a lot of RAM and I wonder how
> big the corpus has to get before you see signficant changes.

That sounds awesome!  I'd love to use Wikipedia for testing LUCENE-843
as well.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message