lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Replicating Lucene Index with out SOLR
Date Thu, 28 Aug 2008 16:57:16 GMT
Yes, I think you pinpointed what I see over and over with Solr.  The two desires pull in opposite
directions.  I think Jason Rutherglen is very keen to start talking about Lucene clusters
and index replication in such clusters without using the classic master/slave approach.

Jason, want to start a thread on java-dev?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: mark harwood <markharw00d@yahoo.co.uk>
> To: java-user@lucene.apache.org
> Sent: Thursday, August 28, 2008 6:21:19 AM
> Subject: Re: Replicating Lucene Index with out SOLR
> 
> >> You don't need to copy the whole index every time
> >> if you do incremental  indexing/updates and don't optimize the index
> 
> 
> But at 5 minute intervals for replication does this not quickly lead to a very 
> fragmented index?
> 
> It seems there is a fundamental conflict when building replication systems based 
> entirely on the lucene file format:
> * In the interests of good search performance the index should ideally be a 
> small number of large files (which is what mergepolicy/optimize are all about 
> maintaining)
> * However, in the interest of minimising replication network traffic, the ideal 
> is a large number of small files.
> 
> I've previously built replication systems which rely on each server pulling 
> deltas in the form of insert/update/delete records from a database and using 
> IndexWriter locally on each server to apply these sets of changes. Obviously 
> this duplicates the analyzing/indexing effort across replicas but does mean the 
> content being transferred is not restricted by the design of the Lucene file 
> format and therefore uses minimal network traffic and places no restrictions on 
> the IndexWriter merge policies I may choose to use to optimise search speed.
> 
> Keen to explore the pros and cons of these different replication schemes.
> 
> Cheers,
> Mark
> 
> 
> 
> --- On Thu, 28/8/08, rahul_k123 wrote:
> 
> > From: rahul_k123 
> > Subject: Re: Replicating Lucene Index with out SOLR
> > To: java-user@lucene.apache.org
> > Date: Thursday, 28 August, 2008, 6:47 AM
> > Can i make use of solr scripts for this purpose.
> > 
> > 
> > The snapinstaller runs on the slave after a snapshot has
> > been pulled from
> > the master. This signals the local Solr server to open a
> > new index reader,
> > then auto-warming of the cache(s) begins (in the new
> > reader), while other
> > requests continue to be served by the original index
> > reader.
> > 
> > How can i achieve the above in my case??
> > 
> > 
> > Otis Gospodnetic wrote:
> > > 
> > > You don't need to copy the whole index every time
> > if you do incremental
> > > indexing/updates and don't optimize the index
> > before copying.  If you use
> > > rsync for copying the index, only the new/modified
> > files be copied.  This
> > > is what Solr replication scripts do, too.
> > > 
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr -
> > Nutch
> > > 
> > > 
> > > 
> > > ----- Original Message ----
> > >> From: rahul_k123 
> > >> To: general@lucene.apache.org
> > >> Sent: Wednesday, August 27, 2008 11:36:07 PM
> > >> Subject: Re: Replicating Lucene Index with out
> > SOLR
> > >> 
> > >> 
> > >> Currently we index every certain amount of time on
> > A.
> > >> 
> > >> -copy the index
> > >>      Copying the whole index everytime ? 
> > >> 
> > >> Currently i am investigating how i can make use of
> > SOLR replication
> > >> scripts
> > >> to achive this.
> > >> 
> > >> 
> > >> Is there anyone who did this with out SOLR before?
> > >> 
> > >> 
> > >> Thanks
> > >> 
> > >> 
> > >> 
> > >> Otis Gospodnetic wrote:
> > >> > 
> > >> > Hi,
> > >> > 
> > >> > You may want to ask on the java-user list
> > (more subscribers), which I'm
> > >> > CC-ing, so we can continue discussion there.
> > >> > I think you will have to implement your own
> > logic that runs on A and
> > >> does
> > >> > something like this:
> > >> > 
> > >> > - stop adding new docs
> > >> > - call commit on the IndexWriter
> > >> > 
> > >> > - copy the index
> > >> > - resume indexing
> > >> > 
> > >> > Otis
> > >> > --
> > >> > Sematext -- http://sematext.com/ -- Lucene -
> > Solr - Nutch
> > >> > 
> > >> > 
> > >> > 
> > >> > ----- Original Message ----
> > >> >> From: rahul_k123 
> > >> >> To: general@lucene.apache.org
> > >> >> Sent: Thursday, August 28, 2008 1:34:41
> > AM
> > >> >> Subject: Replicating Lucene Index with
> > out SOLR
> > >> >> 
> > >> >> 
> > >> >> I have the following requirement
> > >> >> 
> > >> >> Right now we have multiple indexes 
> > serving our web application. Our
> > >> >> indexes
> > >> >> are around 30 GB size.
> > >> >> 
> > >> >> We want to replicate the index data so
> > that we can use them to
> > >> distribute
> > >> >> the search load.
> > >> >> 
> > >> >> This is what we need ideally.
> > >> >> 
> > >> >> A – (supports writes and reads)
> > >> >> 
> > >> >> A1 –Replicated Index (Supports reads) 
> > . We want to synchronize this
> > >> >> every 5
> > >> >> mins.
> > >> >> 
> > >> >> 
> > >> >> 
> > >> >> Any help is appreciated.   We are not
> > using SOLR
> > >> >> 
> > >> >> I also interested in knowing what will be
> > the best way so that I can
> > >> >> scale
> > >> >> my application adding more boxes for
> > search if our load increases.
> > >> >> 
> > >> >> Thanks.  
> > >> >> 
> > >> >> -- 
> > >> >> View this message in context: 
> > >> >> 
> > >>
> > 
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p19191752.html
> > >> >> Sent from the Lucene - General mailing
> > list archive at Nabble.com.
> > >> > 
> > >> > 
> > >> > 
> > >> 
> > >> -- 
> > >> View this message in context: 
> > >>
> > 
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p19193670.html
> > >> Sent from the Lucene - General mailing list
> > archive at Nabble.com.
> > > 
> > > 
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > > 
> > > 
> > > 
> > 
> > -- 
> > View this message in context:
> > 
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19193696p19194576.html
> > Sent from the Lucene - Java Users mailing list archive at
> > Nabble.com.
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> 
> 
> Send instant messages to your online friends http://uk.messenger.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message