Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 29664 invoked from network); 29 Aug 2008 00:17:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Aug 2008 00:17:11 -0000 Received: (qmail 72285 invoked by uid 500); 29 Aug 2008 00:17:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 72244 invoked by uid 500); 29 Aug 2008 00:17:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 72233 invoked by uid 99); 29 Aug 2008 00:17:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Aug 2008 17:17:01 -0700 X-ASF-Spam-Status: No, hits=3.8 required=10.0 tests=DNS_FROM_OPENWHOIS,FS_REPLICA,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Aug 2008 00:16:02 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1KYrPd-0004vV-NS for java-user@lucene.apache.org; Thu, 28 Aug 2008 17:00:25 -0700 Message-ID: <19211497.post@talk.nabble.com> Date: Thu, 28 Aug 2008 17:00:25 -0700 (PDT) From: rahul_k123 To: java-user@lucene.apache.org Subject: Re: Replicating Lucene Index with out SOLR In-Reply-To: <348886.93561.qm@web50309.mail.re2.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: vishnudeepak@gmail.com References: <19193696.post@talk.nabble.com> <348886.93561.qm@web50309.mail.re2.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org Do i need to stop indexing when i rsync snapshot to the slave? Otis Gospodnetic wrote: >=20 > Yes, I think you pinpointed what I see over and over with Solr. The two > desires pull in opposite directions. I think Jason Rutherglen is very > keen to start talking about Lucene clusters and index replication in such > clusters without using the classic master/slave approach. >=20 > Jason, want to start a thread on java-dev? >=20 > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >=20 >=20 >=20 > ----- Original Message ---- >> From: mark harwood >> To: java-user@lucene.apache.org >> Sent: Thursday, August 28, 2008 6:21:19 AM >> Subject: Re: Replicating Lucene Index with out SOLR >>=20 >> >> You don't need to copy the whole index every time >> >> if you do incremental indexing/updates and don't optimize the index >>=20 >>=20 >> But at 5 minute intervals for replication does this not quickly lead to = a >> very=20 >> fragmented index? >>=20 >> It seems there is a fundamental conflict when building replication >> systems based=20 >> entirely on the lucene file format: >> * In the interests of good search performance the index should ideally b= e >> a=20 >> small number of large files (which is what mergepolicy/optimize are all >> about=20 >> maintaining) >> * However, in the interest of minimising replication network traffic, th= e >> ideal=20 >> is a large number of small files. >>=20 >> I've previously built replication systems which rely on each server >> pulling=20 >> deltas in the form of insert/update/delete records from a database and >> using=20 >> IndexWriter locally on each server to apply these sets of changes. >> Obviously=20 >> this duplicates the analyzing/indexing effort across replicas but does >> mean the=20 >> content being transferred is not restricted by the design of the Lucene >> file=20 >> format and therefore uses minimal network traffic and places no >> restrictions on=20 >> the IndexWriter merge policies I may choose to use to optimise search >> speed. >>=20 >> Keen to explore the pros and cons of these different replication schemes= . >>=20 >> Cheers, >> Mark >>=20 >>=20 >>=20 >> --- On Thu, 28/8/08, rahul_k123 wrote: >>=20 >> > From: rahul_k123=20 >> > Subject: Re: Replicating Lucene Index with out SOLR >> > To: java-user@lucene.apache.org >> > Date: Thursday, 28 August, 2008, 6:47 AM >> > Can i make use of solr scripts for this purpose. >> >=20 >> >=20 >> > The snapinstaller runs on the slave after a snapshot has >> > been pulled from >> > the master. This signals the local Solr server to open a >> > new index reader, >> > then auto-warming of the cache(s) begins (in the new >> > reader), while other >> > requests continue to be served by the original index >> > reader. >> >=20 >> > How can i achieve the above in my case?? >> >=20 >> >=20 >> > Otis Gospodnetic wrote: >> > >=20 >> > > You don't need to copy the whole index every time >> > if you do incremental >> > > indexing/updates and don't optimize the index >> > before copying. If you use >> > > rsync for copying the index, only the new/modified >> > files be copied. This >> > > is what Solr replication scripts do, too. >> > >=20 >> > > Otis >> > > -- >> > > Sematext -- http://sematext.com/ -- Lucene - Solr - >> > Nutch >> > >=20 >> > >=20 >> > >=20 >> > > ----- Original Message ---- >> > >> From: rahul_k123=20 >> > >> To: general@lucene.apache.org >> > >> Sent: Wednesday, August 27, 2008 11:36:07 PM >> > >> Subject: Re: Replicating Lucene Index with out >> > SOLR >> > >>=20 >> > >>=20 >> > >> Currently we index every certain amount of time on >> > A. >> > >>=20 >> > >> -copy the index >> > >> Copying the whole index everytime ?=20 >> > >>=20 >> > >> Currently i am investigating how i can make use of >> > SOLR replication >> > >> scripts >> > >> to achive this. >> > >>=20 >> > >>=20 >> > >> Is there anyone who did this with out SOLR before? >> > >>=20 >> > >>=20 >> > >> Thanks >> > >>=20 >> > >>=20 >> > >>=20 >> > >> Otis Gospodnetic wrote: >> > >> >=20 >> > >> > Hi, >> > >> >=20 >> > >> > You may want to ask on the java-user list >> > (more subscribers), which I'm >> > >> > CC-ing, so we can continue discussion there. >> > >> > I think you will have to implement your own >> > logic that runs on A and >> > >> does >> > >> > something like this: >> > >> >=20 >> > >> > - stop adding new docs >> > >> > - call commit on the IndexWriter >> > >> >=20 >> > >> > - copy the index >> > >> > - resume indexing >> > >> >=20 >> > >> > Otis >> > >> > -- >> > >> > Sematext -- http://sematext.com/ -- Lucene - >> > Solr - Nutch >> > >> >=20 >> > >> >=20 >> > >> >=20 >> > >> > ----- Original Message ---- >> > >> >> From: rahul_k123=20 >> > >> >> To: general@lucene.apache.org >> > >> >> Sent: Thursday, August 28, 2008 1:34:41 >> > AM >> > >> >> Subject: Replicating Lucene Index with >> > out SOLR >> > >> >>=20 >> > >> >>=20 >> > >> >> I have the following requirement >> > >> >>=20 >> > >> >> Right now we have multiple indexes=20 >> > serving our web application. Our >> > >> >> indexes >> > >> >> are around 30 GB size. >> > >> >>=20 >> > >> >> We want to replicate the index data so >> > that we can use them to >> > >> distribute >> > >> >> the search load. >> > >> >>=20 >> > >> >> This is what we need ideally. >> > >> >>=20 >> > >> >> A =E2=80=93 (supports writes and reads) >> > >> >>=20 >> > >> >> A1 =E2=80=93Replicated Index (Supports reads)=20 >> > . We want to synchronize this >> > >> >> every 5 >> > >> >> mins. >> > >> >>=20 >> > >> >>=20 >> > >> >>=20 >> > >> >> Any help is appreciated. We are not >> > using SOLR >> > >> >>=20 >> > >> >> I also interested in knowing what will be >> > the best way so that I can >> > >> >> scale >> > >> >> my application adding more boxes for >> > search if our load increases. >> > >> >>=20 >> > >> >> Thanks. =20 >> > >> >>=20 >> > >> >> --=20 >> > >> >> View this message in context:=20 >> > >> >>=20 >> > >> >> >=20 >> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p= 19191752.html >> > >> >> Sent from the Lucene - General mailing >> > list archive at Nabble.com. >> > >> >=20 >> > >> >=20 >> > >> >=20 >> > >>=20 >> > >> --=20 >> > >> View this message in context:=20 >> > >> >> >=20 >> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p= 19193670.html >> > >> Sent from the Lucene - General mailing list >> > archive at Nabble.com. >> > >=20 >> > >=20 >> > > >> > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: >> > java-user-unsubscribe@lucene.apache.org >> > > For additional commands, e-mail: >> > java-user-help@lucene.apache.org >> > >=20 >> > >=20 >> > >=20 >> >=20 >> > --=20 >> > View this message in context: >> >=20 >> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19193696p= 19194576.html >> > Sent from the Lucene - Java Users mailing list archive at >> > Nabble.com. >> >=20 >> >=20 >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: >> > java-user-unsubscribe@lucene.apache.org >> > For additional commands, e-mail: >> > java-user-help@lucene.apache.org >>=20 >>=20 >> Send instant messages to your online friends >> http://uk.messenger.yahoo.com >>=20 >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 >=20 >=20 --=20 View this message in context: http://www.nabble.com/Replicating-Lucene-Inde= x-with-out-SOLR-tp19193696p19211497.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org