Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 56880 invoked from network); 2 Aug 2007 15:59:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Aug 2007 15:59:39 -0000 Received: (qmail 73066 invoked by uid 500); 2 Aug 2007 15:59:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 72844 invoked by uid 500); 2 Aug 2007 15:59:33 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 72832 invoked by uid 99); 2 Aug 2007 15:59:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Aug 2007 08:59:33 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [69.15.214.217] (HELO email.hannonhill.com) (69.15.214.217) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Aug 2007 15:59:27 +0000 Received: by email.hannonhill.com (Postfix, from userid 543) id 15094632862B; Thu, 2 Aug 2007 11:59:06 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on files.hannonhill.com X-Spam-Level: Received: from [10.0.1.156] (unknown [10.0.1.156]) by email.hannonhill.com (Postfix) with ESMTP id B71F16328594 for ; Thu, 2 Aug 2007 11:59:05 -0400 (EDT) Message-ID: <46B1FEBC.6000208@hannonhill.com> Date: Thu, 02 Aug 2007 11:56:44 -0400 From: Zach Bailey Organization: Hannon Hill User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Clustered Indexing on common network filesystem References: <148883.76636.qm@web52005.mail.re2.yahoo.com> In-Reply-To: <148883.76636.qm@web52005.mail.re2.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-2.5 required=4.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 Rajesh, I forgot to mention this, but we did investigate this option as well and even prototyped it for an internal project. It ended up being too slow for us. It was adding a lot of overhead even to small updates, IIRC, mainly due to the fact that the index was essentially stored as a filesystem in the database. As you can probably imagine, using a database as a filesystem is not very performant. Rajesh parab wrote: > One more alternative, though I am not sure if anyone > is using it. > > Apache Compass has added a plug-in to allow storing > Lucene index files inside the database. This should > work in clustered environment as all nodes will share > the same database instance. > > I am not sure the impact it will have on performance. > > Is anyone using DB for index storage? Any drawbacks of > this approach? > > Regards, > Rajesh > > --- Zach Bailey wrote: > >> Thanks for your response -- >> >> Based on my understanding, hadoop and nutch are >> essentially the same >> thing, with nutch being derived from hadoop, and are >> primarily intended >> to be standalone applications. >> >> We are not looking for a standalone application, >> rather we must use a >> framework to implement search inside our current >> content management >> application. Currently the application search >> functionality is designed >> and built around Lucene, so migrating frameworks at >> this point is not >> feasible. >> >> We are currently re-working our back-end to support >> clustering (in >> tomcat) and we are looking for information on the >> migration of Lucene >> from a single node filesystem index (which is what >> we use now and hope >> to continue to use for clients with a single-node >> deployment) to a >> shared filesystem index on a mounted network share. >> >> We prefer to use this strategy because it means we >> do not have to have >> two disparate methods of managing indexes for >> clients who run in a >> single-node, non-clustered environment versus >> clients who run in a >> multiple-node, clustered environment. >> >> So, hopefully here are some easy questions someone >> could shed some light on: >> >> Is this not a recommended method of managing indexes >> across multiple nodes? >> >> At this point would people recommend storing an >> individual index on each >> node and propagating index updates via a JMS >> framework rather than >> attempting to handle it transparently with a single >> shared index? >> >> Is the Lucene index code so intimately tied to >> filesystem semantics that >> using a shared/networked file system is infeasible >> at this point in time? >> >> What would be the quickest time-to-implementation of >> these strategies >> (JMS vs. shared FS)? The most robust/least >> error-prone? >> >> I really appreciate any insight or response anyone >> can provide, even if >> it is a short answer to any of the related topics, >> "i.e. we implemented >> clustered search using per-node indexing with JMS >> update propagation and >> it works great", or even something as simple as >> "don't use a shared >> filesystem at this point". >> >> Cheers, >> -Zach >> >> testn wrote: >>> Why don't you check out Hadoop and Nutch? It >> should provide what you are >>> looking for. >> > --------------------------------------------------------------------- >> To unsubscribe, e-mail: >> java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: >> java-user-help@lucene.apache.org >> >> > > > > > ____________________________________________________________________________________ > Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. > http://smallbusiness.yahoo.com/webhosting > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org