Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates
 216.139.236.158 as permitted sender)
Message-ID: <19211497.post@talk.nabble.com>
Date: Thu, 28 Aug 2008 17:00:25 -0700 (PDT)
From: rahul_k123 <vishnudeepak@gmail.com>
To: java-user@lucene.apache.org
Subject: Re: Replicating Lucene Index with out SOLR
In-Reply-To: <348886.93561.qm@web50309.mail.re2.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
References: <19193696.post@talk.nabble.com>
 <348886.93561.qm@web50309.mail.re2.yahoo.com>


Do i need to stop indexing when i  rsync snapshot to the slave?


Otis Gospodnetic wrote:
>=20
> Yes, I think you pinpointed what I see over and over with Solr.  The two
> desires pull in opposite directions.  I think Jason Rutherglen is very
> keen to start talking about Lucene clusters and index replication in such
> clusters without using the classic master/slave approach.
>=20
> Jason, want to start a thread on java-dev?
>=20
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>=20
>=20
>=20
> ----- Original Message ----
>> From: mark harwood <markharw00d@yahoo.co.uk>
>> To: java-user@lucene.apache.org
>> Sent: Thursday, August 28, 2008 6:21:19 AM
>> Subject: Re: Replicating Lucene Index with out SOLR
>>=20
>> >> You don't need to copy the whole index every time
>> >> if you do incremental  indexing/updates and don't optimize the index
>>=20
>>=20
>> But at 5 minute intervals for replication does this not quickly lead to =
a
>> very=20
>> fragmented index?
>>=20
>> It seems there is a fundamental conflict when building replication
>> systems based=20
>> entirely on the lucene file format:
>> * In the interests of good search performance the index should ideally b=
e
>> a=20
>> small number of large files (which is what mergepolicy/optimize are all
>> about=20
>> maintaining)
>> * However, in the interest of minimising replication network traffic, th=
e
>> ideal=20
>> is a large number of small files.
>>=20
>> I've previously built replication systems which rely on each server
>> pulling=20
>> deltas in the form of insert/update/delete records from a database and
>> using=20
>> IndexWriter locally on each server to apply these sets of changes.
>> Obviously=20
>> this duplicates the analyzing/indexing effort across replicas but does
>> mean the=20
>> content being transferred is not restricted by the design of the Lucene
>> file=20
>> format and therefore uses minimal network traffic and places no
>> restrictions on=20
>> the IndexWriter merge policies I may choose to use to optimise search
>> speed.
>>=20
>> Keen to explore the pros and cons of these different replication schemes=
.
>>=20
>> Cheers,
>> Mark
>>=20
>>=20
>>=20
>> --- On Thu, 28/8/08, rahul_k123 wrote:
>>=20
>> > From: rahul_k123=20
>> > Subject: Re: Replicating Lucene Index with out SOLR
>> > To: java-user@lucene.apache.org
>> > Date: Thursday, 28 August, 2008, 6:47 AM
>> > Can i make use of solr scripts for this purpose.
>> >=20
>> >=20
>> > The snapinstaller runs on the slave after a snapshot has
>> > been pulled from
>> > the master. This signals the local Solr server to open a
>> > new index reader,
>> > then auto-warming of the cache(s) begins (in the new
>> > reader), while other
>> > requests continue to be served by the original index
>> > reader.
>> >=20
>> > How can i achieve the above in my case??
>> >=20
>> >=20
>> > Otis Gospodnetic wrote:
>> > >=20
>> > > You don't need to copy the whole index every time
>> > if you do incremental
>> > > indexing/updates and don't optimize the index
>> > before copying.  If you use
>> > > rsync for copying the index, only the new/modified
>> > files be copied.  This
>> > > is what Solr replication scripts do, too.
>> > >=20
>> > > Otis
>> > > --
>> > > Sematext -- http://sematext.com/ -- Lucene - Solr -
>> > Nutch
>> > >=20
>> > >=20
>> > >=20
>> > > ----- Original Message ----
>> > >> From: rahul_k123=20
>> > >> To: general@lucene.apache.org
>> > >> Sent: Wednesday, August 27, 2008 11:36:07 PM
>> > >> Subject: Re: Replicating Lucene Index with out
>> > SOLR
>> > >>=20
>> > >>=20
>> > >> Currently we index every certain amount of time on
>> > A.
>> > >>=20
>> > >> -copy the index
>> > >>      Copying the whole index everytime ?=20
>> > >>=20
>> > >> Currently i am investigating how i can make use of
>> > SOLR replication
>> > >> scripts
>> > >> to achive this.
>> > >>=20
>> > >>=20
>> > >> Is there anyone who did this with out SOLR before?
>> > >>=20
>> > >>=20
>> > >> Thanks
>> > >>=20
>> > >>=20
>> > >>=20
>> > >> Otis Gospodnetic wrote:
>> > >> >=20
>> > >> > Hi,
>> > >> >=20
>> > >> > You may want to ask on the java-user list
>> > (more subscribers), which I'm
>> > >> > CC-ing, so we can continue discussion there.
>> > >> > I think you will have to implement your own
>> > logic that runs on A and
>> > >> does
>> > >> > something like this:
>> > >> >=20
>> > >> > - stop adding new docs
>> > >> > - call commit on the IndexWriter
>> > >> >=20
>> > >> > - copy the index
>> > >> > - resume indexing
>> > >> >=20
>> > >> > Otis
>> > >> > --
>> > >> > Sematext -- http://sematext.com/ -- Lucene -
>> > Solr - Nutch
>> > >> >=20
>> > >> >=20
>> > >> >=20
>> > >> > ----- Original Message ----
>> > >> >> From: rahul_k123=20
>> > >> >> To: general@lucene.apache.org
>> > >> >> Sent: Thursday, August 28, 2008 1:34:41
>> > AM
>> > >> >> Subject: Replicating Lucene Index with
>> > out SOLR
>> > >> >>=20
>> > >> >>=20
>> > >> >> I have the following requirement
>> > >> >>=20
>> > >> >> Right now we have multiple indexes=20
>> > serving our web application. Our
>> > >> >> indexes
>> > >> >> are around 30 GB size.
>> > >> >>=20
>> > >> >> We want to replicate the index data so
>> > that we can use them to
>> > >> distribute
>> > >> >> the search load.
>> > >> >>=20
>> > >> >> This is what we need ideally.
>> > >> >>=20
>> > >> >> A =E2=80=93 (supports writes and reads)
>> > >> >>=20
>> > >> >> A1 =E2=80=93Replicated Index (Supports reads)=20
>> > . We want to synchronize this
>> > >> >> every 5
>> > >> >> mins.
>> > >> >>=20
>> > >> >>=20
>> > >> >>=20
>> > >> >> Any help is appreciated.   We are not
>> > using SOLR
>> > >> >>=20
>> > >> >> I also interested in knowing what will be
>> > the best way so that I can
>> > >> >> scale
>> > >> >> my application adding more boxes for
>> > search if our load increases.
>> > >> >>=20
>> > >> >> Thanks. =20
>> > >> >>=20
>> > >> >> --=20
>> > >> >> View this message in context:=20
>> > >> >>=20
>> > >>
>> >=20
>> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p=
19191752.html
>> > >> >> Sent from the Lucene - General mailing
>> > list archive at Nabble.com.
>> > >> >=20
>> > >> >=20
>> > >> >=20
>> > >>=20
>> > >> --=20
>> > >> View this message in context:=20
>> > >>
>> >=20
>> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p=
19193670.html
>> > >> Sent from the Lucene - General mailing list
>> > archive at Nabble.com.
>> > >=20
>> > >=20
>> > >
>> > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail:
>> > java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail:
>> > java-user-help@lucene.apache.org
>> > >=20
>> > >=20
>> > >=20
>> >=20
>> > --=20
>> > View this message in context:
>> >=20
>> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19193696p=
19194576.html
>> > Sent from the Lucene - Java Users mailing list archive at
>> > Nabble.com.
>> >=20
>> >=20
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail:
>> > java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail:
>> > java-user-help@lucene.apache.org
>>=20
>>=20
>> Send instant messages to your online friends
>> http://uk.messenger.yahoo.com
>>=20
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>=20
>=20
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>=20
>=20
>=20

--=20
View this message in context: http://www.nabble.com/Replicating-Lucene-Inde=
x-with-out-SOLR-tp19193696p19211497.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org