Mailing-List: contact lucy-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: lucy-dev@incubator.apache.org
Received-SPF: neutral (athena.apache.org: 207.173.203.201 is neither permitted
 nor denied by domain of david@kineticode.com)
From: "David E. Wheeler" <david@kineticode.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Tue, 22 Mar 2011 22:00:29 -0400
Message-Id: <0D140267-39B7-4A12-A843-3B74DD5FB64A@kineticode.com>
To: lucy-dev@incubator.apache.org
Mime-Version: 1.0 (Apple Message framework v1082)
Subject: [lucy-dev] On Transactionality and Performance

Lucites,

I ended up rewriting the PGXN schema into multiple schemas after =
consulting with Graham Barr on how CPAN search works. I'm pretty happy =
with the results so far, but have a few questions about how indexing =
transactions work.

* Why does `commit()` invalidate an Indexer object?

* Should I be making as many changes to an index as I can before calling =
`commit()`, or can I update bits at a time using separate index objects?

* Is there a way to invalidate an IndexSearcher object when an index =
changes? Or do I just need to create a new searcher for every request? =
If the latter, how efficient is the constructor?

These questions stem mainly from being a database geek, so I tend to =
think in database-style transactions. To whit:

* If I have to update lots of rows, it's more efficient to use =
transactions to do a few at a time. For example, if I need to update =
1,000 rows, I might update 100 at time in separate transactions.

* Once I've committed a transaction, all other connections can see the =
changes.

But I'm starting to suspect this isn't the best way to do it with =
Lucy/KinoSearch. Is it better to:

* Update all 1,000 objects in a single transaction (one indexer, calling =
commit() at the end)?

* Always create a new IndexSearcher for new requests in order to see any =
changes? (I found in tests I was writing that if I updated an index, an =
existing IndexSearcher did *not* see the change -- maybe it was caching =
results for performance?)

Thank you for your patience with my newbish questions.

Best,

David