lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Supported way to get segment from IndexWriter?
Date Fri, 15 Jan 2010 10:09:41 GMT
On Thu, Jan 14, 2010 at 7:57 PM, Chris Hostetter
<> wrote:
> : Since SegmentInfos is now public, you could use to
> : read the current segments_N file, and then call its .size() method?
> :
> : But, this will only count as of the last commit... which is probably
> : not sufficient for SOLR-1559?
> Honestly: i have no idea, I'm a little out of touch with awhat "commit"
> means in Lucene-Java these days.

Commit just means that a new segments_N file is written into the
index, so that an external reader on doing an open/reopen would see
all changes made with the IndexWriter prior to the commit.  (commit
also makes the changes "durable", ie, will survive a crash, power
loss, etc, by syncing the necessary files).

> The goal is to be able to compute a maxNumberOfSegments relative to "the
> current number of segments", some people might percieve that as the
> current number of "committed" segments -- but really it comes down to
> what optimize is going to do with the resulting number.
> if someone has the goal of making iterative micro optimizations to their
> index, so they say "optimize to $currentSegmentCount-1 segments" but the
> number of commited segments is 3 and the number of uncommited segments is
> 27 higher (because of active indexing) so the app starts trashing as it
> tries to optimize down from 27 to 3 that doesn't feel like "Do what i
> mean"

Right.  Merging/optimizing currently always run against all (committed
& uncommitted) segments...

> : We could simply make getSegmentCount public / expert / not only for tests?
> I was considering doing that in the SolrIndexWriter class that already
> exists as a subclass of IndexWriter, but i didn't want to go that route if
> there was a good reason why IndexWriter.getSegmentCount isn't already
> public (ie: if there was an expecation that the way IndexWRiter manages
> segments was subject to refactoring getSegmentCount out of existence)

I think it's fine to make this public and mark it as expert, as well
as note that one can't rely on "when" IW makes new segments.  Eg,
today when you get a near real-time reader, IW creates a new segment,
but in the future it's possible it will not.  So this expert method
lets you peek into the index structure, but what count it returns
after a series of methods invoked on IW, is subject to change.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message