Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 24681 invoked from network); 27 Mar 2002 08:49:07 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 27 Mar 2002 08:49:07 -0000 Received: (qmail 15151 invoked by uid 97); 27 Mar 2002 08:49:16 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 15112 invoked by uid 97); 27 Mar 2002 08:49:15 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 15101 invoked from network); 27 Mar 2002 08:49:15 -0000 Date: Wed, 27 Mar 2002 09:49:13 +0100 From: Peter Sojan To: lucene-user@jakarta.apache.org Subject: Database integration best practices ... Message-ID: <20020327094913.A8602@zargon-client1.chello.at> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.12i X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi! As many others I want to use Lucene as a frontend for searching content which is burried in a relational database. As far as I can see this should be no problem, by building documents for single rows in the tables. Since many of you have already done such an approach I would appreciate any suggestions on the following issues: - Consistency What is the best way to maintain consistency between the database and the lucene index. I can think of two solutions: - update index on every insert - ignore index at insert and do full reindex after time (e.g. nightly) - Transactional issues what is the best way to make a database insert + index insert atomic!? - Content Separation My content in the database is spread across multiple tables. But there are clusters of related tables. For example I have 3 tables describing authors of papers. My solution would be a separate index for each of those clusters. When the user does a search every index must be searched separately of course ... Is maintaining a separate index for every "topic" a good idea? One might ask why not searching against the database directly. Well, I would have to build a search interface (think of boolean issues) on my own, which is definitely something I do not have time for. Additionally my database (Postgresql) doesn't support full-text searches (yet). Any additional input on your expiriences are very welcome! Thx in advance, Peter -- To unsubscribe, e-mail: For additional commands, e-mail: