lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar" <shalinman...@gmail.com>
Subject Re: How can we know if 2 lucene indexes are same?
Date Fri, 05 Sep 2008 09:19:49 GMT
Let me try to explain.

I have a master where indexing is done. I have multiple slaves for querying.

If I commit+optimize on the master and then rsync the index, the data
transferred on the network is huge. An alternate way is to commit on master,
transfer the delta to the slave and issue an optimize on the slave. This is
very fast because less data is transferred on the network.

However, we need to ensure that the index on the slave is actually in sync
with the master. So that on another commit, we can blindly transfer the
delta to the slave.

On Fri, Sep 5, 2008 at 1:03 PM, 叶双明 <yeshuangming@gmail.com> wrote:

> Do you use index at the slave as a backup for index at the master??
> And in case the master break down, you can turn the query to the slave??
>
> When add a Document to master, also add it to the slave?
>
> Sorry, I don't clear about what your problem, can you show more detail
> about
> what do you worry about?
>
> 2008/9/5 Noble Paul നോബിള്‍ नोब्ळ् <noble.paul@gmail.com>
>
> > I am not using the same index with different writers. These are two
> > separate indexes both have their own reader/writer
> >
> > I just wanted to minimize the network load by avoiding the download of
> > an optimized index if the contents are indeed same.
> > --noble
> >
> > On Thu, Sep 4, 2008 at 7:36 PM, Michael McCandless
> > <lucene@mikemccandless.com> wrote:
> > >
> > > Sorry, I should have said: you must always use the same writer, ie as
> of
> > > 2.3, while IndexWriter.optimize (or normal segment merging) is running,
> > > under one thread, another thread can use that *same* writer to
> > > add/delete/update documents, and both are free to make changes to the
> > index.
> > >
> > > Before 2.3, optimize() was fully synchronized and blocked
> > add/update/delete
> > > documents from changing the index until the optimize() call completed.
> > >
> > > So, your test is expected to fail: you're not allowed to open 2
> "writers"
> > on
> > > a single index at the same time, where "writer" includes an IndexReader
> > that
> > > deletes documents; so those exceptions (LockObtainFailed, StaleReader)
> > are
> > > expected.
> > >
> > > Mike
> > >
> > > 叶双明 wrote:
> > >
> > >> I don't agreed with Michael McCandless. :)
> > >>
> > >> I konw that after 2.3, add and delete can run in one IndexWriter at
> one
> > >> time, and also lucene has a update method which delete documents by
> term
> > >> then add the new document.
> > >>
> > >> In my test, either LockObtainFailedException with thread sleep
> sentence:
> > >>
> > >> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
> > out:
> > >> SimpleFSLock@E:\index\write.lock
> > >> at org.apache.lucene.store.Lock.obtain(Lock.java:85)
> > >> at
> > >>
> > >>
> >
> org.apache.lucene.index.DirectoryIndexReader.acquireWriteLock(DirectoryIndexReader.java:298)
> > >> at
> > >>
> org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:750)
> > >> at
> > >>
> > org.apache.lucene.index.IndexReader.deleteDocuments(IndexReader.java:786)
> > >> at org.test.IndexThread.run(IndexThread.java:33)
> > >>
> > >> or StaleReaderException without thread sleep sentence:
> > >>
> > >> org.apache.lucene.index.StaleReaderException: IndexReader out of date
> > and
> > >> no
> > >> longer valid for delete, undelete, or setNorm operations
> > >> at
> > >>
> > >>
> >
> org.apache.lucene.index.DirectoryIndexReader.acquireWriteLock(DirectoryIndexReader.java:308)
> > >> at
> > >>
> org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:750)
> > >> at
> > >>
> > org.apache.lucene.index.IndexReader.deleteDocuments(IndexReader.java:786)
> > >> at org.test.IndexThread.run(IndexThread.java:31)
> > >>
> > >> My test code:
> > >>
> > >>
> > >> public class Main {
> > >>
> > >> public static void main(String[] args) throws IOException {
> > >>  Directory directory = FSDirectory.getDirectory("e:/index");
> > >>  IndexWriter writer = new IndexWriter(directory, null, false);
> > >>  Document document = new Document();
> > >>  document.add(new Field("bbb", "bbb", Store.YES, Index.UN_TOKENIZED));
> > >>  writer.addDocument(document);
> > >>
> > >>  Thread t = new IndexThread();
> > >>  t.start();
> > >>
> > >>  try {
> > >>  Thread.sleep(1000);
> > >>  } catch (InterruptedException e) {
> > >>  // TODO Auto-generated catch block
> > >>  e.printStackTrace();
> > >>  }
> > >>
> > >>  writer.optimize();
> > >>  writer.close();
> > >>  System.out.println("out");
> > >> }
> > >> }
> > >>
> > >> public class IndexThread extends Thread {
> > >>
> > >> @Override
> > >> public void run() {
> > >>  Directory directory;
> > >>  try {
> > >>  try {
> > >>   Thread.sleep(10);
> > >>  } catch (InterruptedException e) {
> > >>   // TODO Auto-generated catch block
> > >>   e.printStackTrace();
> > >>  }
> > >>
> > >>  directory = FSDirectory.getDirectory("e:/index");
> > >>  System.out.println("thread begin");
> > >>  //IndexWriter reader = new IndexWriter(directory, null, false);
> > >>  IndexReader reader = IndexReader.open(directory);
> > >>  Term term = new Term("bbb", "bbb");
> > >>  reader.deleteDocuments(term);
> > >>  reader.close();
> > >>  System.out.println("thread end");
> > >>  } catch (IOException e) {
> > >>  // TODO Auto-generated catch block
> > >>  e.printStackTrace();
> > >>  }
> > >> }
> > >> }
> > >>
> > >>
> > >>
> > >> 2008/9/4, Michael McCandless <lucene@mikemccandless.com>:
> > >>>
> > >>>
> > >>> Actually, as of 2.3, this is no longer true: merges and optimizing
> run
> > in
> > >>> the background, and allow add/update/delete documents to run at the
> > same
> > >>> time.
> > >>>
> > >>> I think it's probably best to use application logic (outside of
> Lucene)
> > >>> to
> > >>> keep track of what updates happened to the master while the slave was
> > >>> optimizing.
> > >>>
> > >>> Mike
> > >>>
> > >>> 叶双明 wrote:
> > >>>
> > >>> No documents can added into index when the index is optimizing,  or
> > >>>>
> > >>>> optimizing can't run durling documents adding to the index.
> > >>>> So, without other error, I think we can beleive the two index are
> > indeed
> > >>>> the
> > >>>> same.
> > >>>>
> > >>>> :)
> > >>>>
> > >>>> 2008/9/4 Noble Paul നോബിള്‍ नोब्ळ् <noble.paul@gmail.com>
> > >>>>
> > >>>> The use case is as follows
> > >>>>>
> > >>>>> I have two indexes . One at the master and one at the slave.
The
> user
> > >>>>> occasionally keeps committing on the master and the delta is
> > >>>>> replicated everytime. But when the optimize happens the transfer
> size
> > >>>>> can be really large. So I am thinking of  doing the optimize
> > >>>>> separately on master and slave .
> > >>>>>
> > >>>>> So far, so good. But how can I really know that after the optimize
> > the
> > >>>>> indexes are indeed the same or no documents got added in between.?
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Aug 29, 2008 at 3:13 PM, Karl Wettin <
> karl.wettin@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>>
> > >>>>>> 29 aug 2008 kl. 11.35 skrev Noble Paul നോബിള്‍
नोब्ळ्:
> > >>>>>>
> > >>>>>> hi,
> > >>>>>>>
> > >>>>>>> I wish to know if the contents of two indexes have
same data.
> > >>>>>>> will all the files be exactly same if I put same set
of documents
> > to
> > >>>>>>>
> > >>>>>> both?
> > >>>>>
> > >>>>>>
> > >>>>>> If you insert the documents in the same order with the
same
> settings
> > >>>>>> and
> > >>>>>> both indices are optimized, then the files ought to be
> identitical.
> > >>>>>> I'm
> > >>>>>> however not sure.
> > >>>>>>
> > >>>>>> The instantiated index contrib module contains a test that
assert
> > two
> > >>>>>>
> > >>>>> index
> > >>>>>
> > >>>>>> readers are identical. You could use this to be really
sure, but
> it
> > it
> > >>>>>> a
> > >>>>>> rather long running process for a large index:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> >
> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestIndicesEquals.java
> > >>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Perhaps you should explain why you need to do this.
> > >>>>>>
> > >>>>>>
> > >>>>>>      karl
> > >>>>>>
> > ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> --Noble Paul
> > >>>>>
> > >>>>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>>
> > >>>
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> >
> > --
> > --Noble Paul
> >
>



-- 
Regards,
Shalin Shekhar Mangar.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message