lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Replicating Lucene Index with out SOLR
Date Thu, 28 Aug 2008 10:21:19 GMT
>> You don't need to copy the whole index every time
>> if you do incremental  indexing/updates and don't optimize the index


But at 5 minute intervals for replication does this not quickly lead to a very fragmented
index?

It seems there is a fundamental conflict when building replication systems based entirely
on the lucene file format:
* In the interests of good search performance the index should ideally be a small number of
large files (which is what mergepolicy/optimize are all about maintaining)
* However, in the interest of minimising replication network traffic, the ideal is a large
number of small files.

I've previously built replication systems which rely on each server pulling deltas in the
form of insert/update/delete records from a database and using IndexWriter locally on each
server to apply these sets of changes. Obviously this duplicates the analyzing/indexing effort
across replicas but does mean the content being transferred is not restricted by the design
of the Lucene file format and therefore uses minimal network traffic and places no restrictions
on the IndexWriter merge policies I may choose to use to optimise search speed.

Keen to explore the pros and cons of these different replication schemes.

Cheers,
Mark



--- On Thu, 28/8/08, rahul_k123 <vishnudeepak@gmail.com> wrote:

> From: rahul_k123 <vishnudeepak@gmail.com>
> Subject: Re: Replicating Lucene Index with out SOLR
> To: java-user@lucene.apache.org
> Date: Thursday, 28 August, 2008, 6:47 AM
> Can i make use of solr scripts for this purpose.
> 
> 
> The snapinstaller runs on the slave after a snapshot has
> been pulled from
> the master. This signals the local Solr server to open a
> new index reader,
> then auto-warming of the cache(s) begins (in the new
> reader), while other
> requests continue to be served by the original index
> reader.
> 
> How can i achieve the above in my case??
> 
> 
> Otis Gospodnetic wrote:
> > 
> > You don't need to copy the whole index every time
> if you do incremental
> > indexing/updates and don't optimize the index
> before copying.  If you use
> > rsync for copying the index, only the new/modified
> files be copied.  This
> > is what Solr replication scripts do, too.
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr -
> Nutch
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: rahul_k123 <vishnudeepak@gmail.com>
> >> To: general@lucene.apache.org
> >> Sent: Wednesday, August 27, 2008 11:36:07 PM
> >> Subject: Re: Replicating Lucene Index with out
> SOLR
> >> 
> >> 
> >> Currently we index every certain amount of time on
> A.
> >> 
> >> -copy the index
> >>      Copying the whole index everytime ? 
> >> 
> >> Currently i am investigating how i can make use of
> SOLR replication
> >> scripts
> >> to achive this.
> >> 
> >> 
> >> Is there anyone who did this with out SOLR before?
> >> 
> >> 
> >> Thanks
> >> 
> >> 
> >> 
> >> Otis Gospodnetic wrote:
> >> > 
> >> > Hi,
> >> > 
> >> > You may want to ask on the java-user list
> (more subscribers), which I'm
> >> > CC-ing, so we can continue discussion there.
> >> > I think you will have to implement your own
> logic that runs on A and
> >> does
> >> > something like this:
> >> > 
> >> > - stop adding new docs
> >> > - call commit on the IndexWriter
> >> > 
> >> > - copy the index
> >> > - resume indexing
> >> > 
> >> > Otis
> >> > --
> >> > Sematext -- http://sematext.com/ -- Lucene -
> Solr - Nutch
> >> > 
> >> > 
> >> > 
> >> > ----- Original Message ----
> >> >> From: rahul_k123 
> >> >> To: general@lucene.apache.org
> >> >> Sent: Thursday, August 28, 2008 1:34:41
> AM
> >> >> Subject: Replicating Lucene Index with
> out SOLR
> >> >> 
> >> >> 
> >> >> I have the following requirement
> >> >> 
> >> >> Right now we have multiple indexes 
> serving our web application. Our
> >> >> indexes
> >> >> are around 30 GB size.
> >> >> 
> >> >> We want to replicate the index data so
> that we can use them to
> >> distribute
> >> >> the search load.
> >> >> 
> >> >> This is what we need ideally.
> >> >> 
> >> >> A – (supports writes and reads)
> >> >> 
> >> >> A1 –Replicated Index (Supports reads) 
> . We want to synchronize this
> >> >> every 5
> >> >> mins.
> >> >> 
> >> >> 
> >> >> 
> >> >> Any help is appreciated.   We are not
> using SOLR
> >> >> 
> >> >> I also interested in knowing what will be
> the best way so that I can
> >> >> scale
> >> >> my application adding more boxes for
> search if our load increases.
> >> >> 
> >> >> Thanks.  
> >> >> 
> >> >> -- 
> >> >> View this message in context: 
> >> >> 
> >>
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p19191752.html
> >> >> Sent from the Lucene - General mailing
> list archive at Nabble.com.
> >> > 
> >> > 
> >> > 
> >> 
> >> -- 
> >> View this message in context: 
> >>
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p19193670.html
> >> Sent from the Lucene - General mailing list
> archive at Nabble.com.
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > 
> > 
> > 
> 
> -- 
> View this message in context:
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19193696p19194576.html
> Sent from the Lucene - Java Users mailing list archive at
> Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org


Send instant messages to your online friends http://uk.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message