lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Allouche <>
Subject Live index upgrading
Date Mon, 17 Jun 2019 15:41:31 GMT

I use Lucene with PyLucene on a public-facing web application. We have a moderately large
index (~24M documents, ~11GB index data), with a constant stream of new documents.

I recently upgraded to PyLucene 7.

When trying to test the new release of PyLucene 8, I encountered an IndexFormatTooOld error
because my index conversion from Lucene6 to Lucene7 was not complete.

I found IndexUpgrader, and I had a look at its implementation. I would very much like to avoid
putting down the service during the index upgrade, so I believe I cannot use IndexUpgrader
because I need the write lock to be held by the web application to index new documents.

So I figure I could get the desired result with an IndexWriter.forceMerge(1). But the documentation
says "This is a horribly costly operation, especially when you pass a small maxNumSegments;
usually you should only call this if the index is static (will no longer be changed)."

And indeed, forceMerge tends be killed the kernel OOM killer on my development VM. I want
to avoid this failure mode in production. I could increase the VM until it works, but I would
rather have a less brutal approach to upgrading a live index. Something that could run in
the background with reasonable amounts of anonymous memory.

What is the recommended approach to upgrading a live index?

How can I know from the code that the index needs upgrading at all? I could add a manual knob
to start an upgrade, but it would be better if it occurred transparently when I upgrade PyLucene.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message