lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David E. Wheeler" <>
Subject [lucy-dev] On Transactionality and Performance
Date Wed, 23 Mar 2011 02:00:29 GMT

I ended up rewriting the PGXN schema into multiple schemas after consulting with Graham Barr
on how CPAN search works. I'm pretty happy with the results so far, but have a few questions
about how indexing transactions work.

* Why does `commit()` invalidate an Indexer object?

* Should I be making as many changes to an index as I can before calling `commit()`, or can
I update bits at a time using separate index objects?

* Is there a way to invalidate an IndexSearcher object when an index changes? Or do I just
need to create a new searcher for every request? If the latter, how efficient is the constructor?

These questions stem mainly from being a database geek, so I tend to think in database-style
transactions. To whit:

* If I have to update lots of rows, it's more efficient to use transactions to do a few at
a time. For example, if I need to update 1,000 rows, I might update 100 at time in separate

* Once I've committed a transaction, all other connections can see the changes.

But I'm starting to suspect this isn't the best way to do it with Lucy/KinoSearch. Is it better

* Update all 1,000 objects in a single transaction (one indexer, calling commit() at the end)?

* Always create a new IndexSearcher for new requests in order to see any changes? (I found
in tests I was writing that if I updated an index, an existing IndexSearcher did *not* see
the change -- maybe it was caching results for performance?)

Thank you for your patience with my newbish questions.



View raw message