Return-Path: Delivered-To: apmail-incubator-lucy-dev-archive@www.apache.org Received: (qmail 74302 invoked from network); 23 Mar 2011 05:07:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Mar 2011 05:07:58 -0000 Received: (qmail 46350 invoked by uid 500); 23 Mar 2011 05:07:58 -0000 Delivered-To: apmail-incubator-lucy-dev-archive@incubator.apache.org Received: (qmail 46195 invoked by uid 500); 23 Mar 2011 05:07:47 -0000 Mailing-List: contact lucy-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@incubator.apache.org Delivered-To: mailing list lucy-dev@incubator.apache.org Received: (qmail 46184 invoked by uid 99); 23 Mar 2011 05:07:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Mar 2011 05:07:45 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=DATE_IN_PAST_03_06,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 207.173.203.201 is neither permitted nor denied by domain of david@kineticode.com) Received: from [207.173.203.201] (HELO smtp.kineticode.com) (207.173.203.201) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Mar 2011 05:07:37 +0000 Received: from [192.168.1.3] (unknown [66.87.0.108]) by smtp.kineticode.com (Postfix) with ESMTPSA id 5B3595084EB for ; Tue, 22 Mar 2011 22:07:16 -0700 (PDT) From: "David E. Wheeler" Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Tue, 22 Mar 2011 22:00:29 -0400 Message-Id: <0D140267-39B7-4A12-A843-3B74DD5FB64A@kineticode.com> To: lucy-dev@incubator.apache.org Mime-Version: 1.0 (Apple Message framework v1082) X-Mailer: Apple Mail (2.1082) Subject: [lucy-dev] On Transactionality and Performance Lucites, I ended up rewriting the PGXN schema into multiple schemas after = consulting with Graham Barr on how CPAN search works. I'm pretty happy = with the results so far, but have a few questions about how indexing = transactions work. * Why does `commit()` invalidate an Indexer object? * Should I be making as many changes to an index as I can before calling = `commit()`, or can I update bits at a time using separate index objects? * Is there a way to invalidate an IndexSearcher object when an index = changes? Or do I just need to create a new searcher for every request? = If the latter, how efficient is the constructor? These questions stem mainly from being a database geek, so I tend to = think in database-style transactions. To whit: * If I have to update lots of rows, it's more efficient to use = transactions to do a few at a time. For example, if I need to update = 1,000 rows, I might update 100 at time in separate transactions. * Once I've committed a transaction, all other connections can see the = changes. But I'm starting to suspect this isn't the best way to do it with = Lucy/KinoSearch. Is it better to: * Update all 1,000 objects in a single transaction (one indexer, calling = commit() at the end)? * Always create a new IndexSearcher for new requests in order to see any = changes? (I found in tests I was writing that if I updated an index, an = existing IndexSearcher did *not* see the change -- maybe it was caching = results for performance?) Thank you for your patience with my newbish questions. Best, David