lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li" <ning.li...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
Date Wed, 06 Sep 2006 23:23:43 GMT
On 9/6/06, Marvin Humphrey <marvin@rectangular.com> wrote:

> That's one way of thinking about it.  There's only one "thing"
> though: a big bucket of serialized index entries.  At the end of a
> session, those are sorted, pulled apart, and used to write the tis,
> tii, frq, and prx files.

Interesting.

When do you add "merge-worthy" segments? I'd guess at the end of a
session, when it's easy to decide which segments are "merge-worthy".
If so, however, a newer doc could get a smaller docid than an older
doc, right? It's a nice property of Lucene that an older doc always
has a smaller docid. I think some applications use this to decide
newer/older versions of a document.

> In theory, you could apply this technique only to a limited number of
> docs and create segments, say, 10 docs at a time rather than 1 at a
> time.  But then you still have to do something with each 10 doc
> segment, and you don't get the benefits of less disk shuffling and
> lower RAM usage.  Better to just create 1 segment per session.

This means no new documents are visible to IndexReader until a session
is over. In some sense, "1 segment/commit per session" lets an
application decide when a "merge" happens.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message