lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
Date Wed, 06 Sep 2006 16:15:21 GMT
Just brainstorming a little...
Assuming B=1000, M=10 (I think better with concrete examples)

It seems like we should avoid unnecessary merging, allowing up to 9
segments of 1000 documents or less w/o merging.  When we reach 10
segments, they should be merged into a single segment.  Let's assume a
segment of size 8500 is created by the merge.

Assume we write another 10 full segments that are merged into a bigger
segment of size 10,000.

It *feels* like:
 1) we should be able to write full segments of 1000 docs, or less
than that if closing the writer.
 2) we should be able to write a full segment of 1000 docs *after* a
non-full segment w/o having to merge
 3) 10,000 and 8,500 should be at the same index level, not different levels
 4) 1000 and 999 docs should be at the same index level

So, I *think* most of our hypothetical problems go away with a simple
adjustment to f(n):

f(n) = floor(log_M((n-1)/B))

Right?

That allows us to write all buffered docs separately (necessary for
easy deletions),
allows us to only merge M segments at a time (decreases number of
merges), and allows us to maintain a monotonically decreasing f(n).

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message