lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Badano Andrea <Andrea.Bad...@sweco.se>
Subject ControlledRealTimeReopenThread
Date Mon, 01 Dec 2014 21:22:14 GMT

Hello,

My apologies for a longish question.

I am having some problems with a class that tries to ensure that a lucene index is
always kept up-to-date with the contents of a mysql master database. Users add,
modify, and delete items in the master database, and all changes to the master
database are immediately propagated to the index. When the application starts up,
all items present in the master database that are not present in the index are
added to the index. Similarly, all items present in the index that are not present
in the master database are removed from the index.

I am trying to do this with code based on http://stackoverflow.com/questions/17993960/lucene-4-4-0-new-controlledrealtimereopenthread-sample-usage.
Automatically copying data from the master database to the index seems to work.
However, removing items from the index not present in the database does not seem to work.

So I have this class:

class IndexWrapper {
  private final IndexWriter _iw;
  private final TrackingIndexWriter _triw;
  private final ReferenceManager<IndexSearcher> _rmgr;
  private final ControlledRealTimeReopenThread<IndexSearcher> _reopen;
  private final Analyzer _analyzer;
  private AtomicLong _gen;
  ...
}

that is set up as follows:

_iw = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_4_10_2, analyzer));
_triw = new TrackingIndexWriter(_iw);
_rmgr = new SearcherManager(_iw, true, null);
_reopen = new ControlledRealTimeReopenThread<IndexSearcher>(_triw,_rmgr, 60.00, 0.1);
_analyzer = analyzer;
_gen = new AtomicLong(_triw.getGeneration());
_reopen.start();

First some code that fetches every doc in the index is called:

_reopen.waitForGeneration(_gen.get()); // wait until the index is re-opened for the last update
IndexSearcher searcher = _rmgr.acquire();
try {
  ... fetch all documents in index ...
}
finally {
  _rmgr.release(searcher);
}

This returns all docs in the index. Later on, there is an attempt to remove some of these
documents
(the ones that no longer exist in the master database):

long curr = _gene.get();
_gen.compareAndSet(curr, _triw.deleteDocuments(termToRemove));
_iw.commit();

This code runs without any exceptions being thrown, but it does not seem to remove anything.
If I enable logging, I see things such as:

DW : anyChanges? numDocsInRam=0 deletes=false hasTickets:false pendingChangesInFullFlush:
false

Supposedly the printout

numDocsInRam=0

means that commit() has not found any documents to delete. Also, if I add some extra logging
to IndexWriter.deleteDocuments() like so:

public void deleteDocuments(Term... terms) throws IOException {
  ensureOpen();
  try {
    boolean dt = docWriter.deleteTerms(terms);
    System.err.printf("DELETING TERMS : %s\n", terms);
    System.err.printf("DT : %s\n", dt);
    if (dt) {
      processEvents(true, false);
    }
  } catch (OutOfMemoryError oom) {
    tragicEvent(oom, "deleteDocuments(Term..)");
  }
}

I can see printouts :

DT : false

So, an IndexWriter is given to a ReferenceManager which is then used to create an IndexSearcher
that returns a set of documents. Yet later, when an attempt is made to remove some of these
documents, the IndexWriter (or rather, its docWriter), cannot find these documents. Assuming
that the IndexWriter is somehow involved in the inital fetch of all documents, I am confused
how
the IndexWriter a short while later cannot find some of these documents that have been marked
(by my application) for deletion. I am pretty sure that the Term objects that are passed into
deleteDocuments() are compatible with the documents previously returned by the IndexSearcher.
So have I misunderstood the role of the IndexWriter as some kind of central gateway to all
documents?

Andrea

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message