Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 67398 invoked from network); 16 Oct 2002 22:52:43 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 16 Oct 2002 22:52:43 -0000 Received: (qmail 17243 invoked by uid 97); 16 Oct 2002 22:53:33 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 17200 invoked by uid 97); 16 Oct 2002 22:53:32 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 17164 invoked by uid 98); 16 Oct 2002 22:53:30 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) Message-ID: From: John Cwikla To: 'Lucene Users List' Subject: RE: Concurency in Lucene Date: Wed, 16 Oct 2002 15:52:30 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N I personally would love to see this as part of lucene, or just sent to me :). We are doing much the same thing with lucene and have been running up against the exact problems, especially since we have also created indexes per account for our own users for safety and rebuilding, and run up against this all the time. We were about to start to see what could be done in the directions you have gone. John Cwikla -----Original Message----- From: kiril.zack@epiphany.com [mailto:kiril.zack@epiphany.com] Sent: Wednesday, October 16, 2002 3:45 PM To: lucene-user@jakarta.apache.org Subject: Concurency in Lucene My company, Epiphany, has decided to integrate our products with Lucene. I'm leading this effort, and for this I have developed a solution around Lucene that allows concurrent processes to search, insert, update and delete documents. This solution solves the following: - concurrent writing (insert, update, delete) to the Index (see http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12588 and http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg01795.html - not-transactional nature of Lucene. Solution puts transaction around every insert, update and delete. All writes are guaranteed to be in the index eventually. - running out of file handles. - solution does all of the book-keeping, clients do not worry about when to open and close IndexReader/Writer. Technically one can do this after every operation, but creating/deleting of .lock file slows things down. In summary, every write (update, delete, insert) is made to log file first. There is a worker thread that wakes up every so often, examines the logs, and makes a decision on whether to propagate changes or not (this is configurable). If decision is to propagate changes, thread creates new log files, locks current log files, makes a copy of the new index, merges changes from logs to the index, and then hot-swaps the newly created index and deletes the old logs and index. At any given time, result from search will not contain deleted documents, but newly created/updated documents will not be in search result until merge is finished. Worker thread also keeps state of the logs/index in case of crash. Here is what were the driven factors to create this solution. Need for concurrent non-blocking writes (insert/update/delete) Need for deleted documents not to show up in the query result (Hits) once deleted Lucene does not handle crashes well. The mentality is "if in doubt, redo index" which does not work in some cases. Rebuilding of the index is fast, but in our case a) it takes too many non-Lucene related recourses (documents can be stored in database), b) high availability of search is a requirement - Lucene can leave .lock files. - Lucene keeps state (documents) in memory I wanted to see how much interest is out there for such a solution and whether Lucene developers feel that this should be part of Lucene. If there is enough interest I would like to donate this code to Lucene. Thanks, Kiril Zack -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: For additional commands, e-mail: