lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Lucene directory copy - master copy to local index
Date Tue, 23 Jun 2009 10:08:46 GMT
Hi


You could look at the Solr scripts, I guess (the ones that use rsync, not
new java only method).

But it isn't that hard.  See the rsync docs for options and how to get it up
and running (email me off list if you like - that has nothing to do with
lucene) then you need to fire off an rsync copy at the right time.  Mike is
right of course in that you need to tie it in with your indexing job so that
you only copy indexes in known, consistent states.

One way would be for your master updater process to close the index writer
after every n updates or every x minutes or y hours, or whatever makes sense
for you, run rsync, then start updating again.  Your search slaves can check
index version or be notified directly or, again, whatever makes sense for
you.

Hope that helps.


--
Ian.

On Tue, Jun 23, 2009 at 10:35 AM, Amin Mohammed-Coleman <aminmc@gmail.com>wrote:

Hi

>
> Sorry for sending the below..what I meant to say was is there any
> documentation that I can be pointed to with using lucene and rsync?  I been
> up since 2am so brain slowing down really quickly...
>
> Cheers
> Amin
>
> On Tue, Jun 23, 2009 at 10:13 AM, Amin Mohammed-Coleman <aminmc@gmail.com
> >wrote:
>
> > Hi
> >
> > Thanks for your replies.  Is there any documentation that I can look at
> for
> > using rsync?  I am thinking of creating my own solution (early days) I
> might
> > come back to Solr.
> >
> >
> > Cheers
> > Amin
> >
> >
> > On Mon, Jun 22, 2009 at 10:24 AM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >> Solr (as of 1.4) is moving to Java only implementation for replication
> >> (http://wiki.apache.org/solr/SolrReplication).
> >>
> >> If you roll your own, make sure to only run rsync at "proper" times
> >> according to the master, eg, it's probably simplest to close the
> >> writer on the master before running rsync.  If you don't want to pause
> >> indexing when copying, then you should use SnapshotDeletionPolicy to
> >> get the list of files that need copying and pass that list to rsync.
> >> If you simply rsync whenever you want to, and a writer is open on the
> >> index, it's easy to get a corrupt copy.
> >>
> >> Mike
> >>
> >> On Mon, Jun 22, 2009 at 5:09 AM, Ian Lea<ian.lea@gmail.com> wrote:
> >> > Or if you don't want to use Solr, rsync is very good for this.  Only
> >> takes
> >> > changes, takes care of deletions, very robust.  That is what Solr
> uses.
> >> >
> >> > See http://wiki.apache.org/solr/CollectionDistribution for details or
> >> ideas.
> >> >
> >> >
> >> > --
> >> > Ian.
> >> >
> >> > On Sun, Jun 21, 2009 at 4:32 AM, Otis Gospodnetic <
> >> > otis_gospodnetic@yahoo.com> wrote:
> >> >
> >> >>
> >> >> Hi Amin,
> >> >>
> >> >> Have a look at Solr, it may be what you are after:
> >> >> http://lucene.apache.org/solr/
> >> >>
> >> >> Otis
> >> >> --
> >> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message ----
> >> >> > From: Amin Mohammed-Coleman <aminmc@gmail.com>
> >> >> > To: java-user@lucene.apache.org
> >> >> > Sent: Saturday, June 20, 2009 1:32:18 PM
> >> >> > Subject: Lucene directory copy - master copy to local index
> >> >> >
> >> >> > Hi
> >> >> > I am prototyping the following situation:
> >> >> >
> >> >> > 1) Multiple nodes in a cluster
> >> >> > 2) Each node has a local index
> >> >> > 3) Search requests are maded against the local index
> >> >> > 4) Index updates are sent to a JMS where a master process adds
> >> document
> >> >> to
> >> >> > index
> >> >> > 5) Each node is configured to check whether the local index is
out
> of
> >> >> date
> >> >> > and needs to copy from master index.
> >> >> >
> >> >> > I have created a class that will be configured (quartz job) to
copy
> >> >> master
> >> >> > directory files to the local directory.  I was wondering if I
could
> >> get
> >> >> > advice on whether this is the best way to copy master directory
> >> changes
> >> >> to
> >> >> > the local.
> >> >> >
> >> >> > public class LocalIndexProvider {
> >> >> >
> >> >> > private Directory masterDirectory;
> >> >> >
> >> >> > private Directory localDirectory;
> >> >> >
> >> >> > private final Logger LOGGER = Logger.getLogger(getClass());
> >> >> >
> >> >> > public LocalIndexProvider(Directory masterDirectory, Directory
> >> >> > localDirectory) {
> >> >> >
> >> >> > this.masterDirectory  = masterDirectory;
> >> >> >
> >> >> > this.localDirectory = localDirectory;
> >> >> >
> >> >> > }
> >> >> >
> >> >> > public void updateLocalIndex() throws Exception {
> >> >> >
> >> >> > StopWatch stopWatch = new StopWatch("copy-time");
> >> >> >
> >> >> > stopWatch.start();
> >> >> >
> >> >> > String[] masterFiles = masterDirectory.list();
> >> >> >
> >> >> > String[] localFiles = localDirectory.list();
> >> >> >
> >> >> > Lock lock = localDirectory.getLockFactory().makeLock("test");
> >> >> >
> >> >> > lock.obtain();
> >> >> >
> >> >> > try {
> >> >> >
> >> >> > if (localFiles.length != masterFiles.length) {
> >> >> >
> >> >> > Directory.copy(masterDirectory, localDirectory, false);
> >> >> >
> >> >> > }
> >> >> >
> >> >> > }finally {
> >> >> >
> >> >> > lock.release();
> >> >> >
> >> >> > }
> >> >> >
> >> >> > stopWatch.stop();
> >> >> >
> >> >> > LOGGER.debug("total time taken :" +
> stopWatch.getTotalTimeMillis());
> >> >> >
> >> >> > }
> >> >> >
> >> >> > }
> >> >> >
> >> >> >
> >> >> >
> >> >> > The above is just a proof of concept and I am open to any comments
> >> about
> >> >> it.
> >> >> > It may be inefficient or it may be.  I have looked at the source
> code
> >> >> from
> >> >> > hibernate search and it seems as though they copy each file from
> the
> >> >> > directory (master to local).  I'm not sure whether this is
> something
> >> I
> >> >> need
> >> >> > to do or the above (with more refinement) would be enough.
> >> >> >
> >> >> > Any help would be highly appreciated.
> >> >> >
> >> >> > Cheers
> >> >> > Amin
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message