lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: ControlledRealTimeReopenThread
Date Tue, 02 Dec 2014 09:36:53 GMT
TextField is dangerous: it is analyzed, possible into more then one
token, and then your deletes won't work.  It's safer to use
StringField for tokens you later want to delete by.

Try making a standalone test that just deletes documents first...

You don't need to iw.commit to make commits visible: the next reader
refresh after deletes were done will reflect them.


Mike McCandless

http://blog.mikemccandless.com


On Mon, Dec 1, 2014 at 6:44 PM, Michael Sokolov
<msokolov@safaribooksonline.com> wrote:
> Yes that all looks reasonable.  Maybe there is a mismatch in the analysis
> chain?  I'm just throwing out wild guesses because I don't really see any
> problems in what you shared.  Also - if the problem really has something to
> do with ControlledRealTimeReopenThread, I'm not going to have the answer, so
> I apologize but I think I need to bow out.
>
>
> -Mike
>
>
> On 12/1/2014 6:22 PM, Badano Andrea wrote:
>>
>> Thanks for your reply!
>>
>> I try to delete documents using a term that matches a Document TextField:
>>
>>    private static final String NAME = "name";
>>
>>    private void store(String n, ... other fields ...) {
>>      Document d = new Document();
>>      d.add(new TextField(NAME, n, Field.Store.YES));
>>      ... add other fields ...
>>      _iw.addDocument(d);
>>    }
>>
>>    private void remove(String n) {
>>      Term t = new Term(NAME, n);
>>      _iw.deleteDocuments(t);
>>    }
>>
>> Is it possible to remove a document in this manner? Create a Term object
>> based on a document field of type TextField?
>>
>> I never close() any of the documents created in my wrapper.
>> All add/update/deletes go via the TrackingIndexWriter, while all commits
>> are called on the underlying IndexWriter.
>>
>> Regards,
>>
>> Andrea
>>
>>
>>
>>
>>
>>
>> On 1 Dec 2014, at 23:23, Michael Sokolov <msokolov@safaribooksonline.com>
>> wrote:
>>
>> It's impossible to tell since you didn't include the code for it, but my
>> advice would be to look at how the documents are being marked for deletion.
>> What are the terms being used to delete them?  Are you trying to use lucene
>> docids?
>>
>> -Mike
>>
>> On 12/1/2014 4:22 PM, Badano Andrea wrote:
>>>
>>> Hello,
>>>
>>> My apologies for a longish question.
>>>
>>> I am having some problems with a class that tries to ensure that a lucene
>>> index is
>>> always kept up-to-date with the contents of a mysql master database.
>>> Users add,
>>> modify, and delete items in the master database, and all changes to the
>>> master
>>> database are immediately propagated to the index. When the application
>>> starts up,
>>> all items present in the master database that are not present in the
>>> index are
>>> added to the index. Similarly, all items present in the index that are
>>> not present
>>> in the master database are removed from the index.
>>>
>>> I am trying to do this with code based on
>>> http://stackoverflow.com/questions/17993960/lucene-4-4-0-new-controlledrealtimereopenthread-sample-usage.
>>> Automatically copying data from the master database to the index seems to
>>> work.
>>> However, removing items from the index not present in the database does
>>> not seem to work.
>>>
>>> So I have this class:
>>>
>>> class IndexWrapper {
>>>    private final IndexWriter _iw;
>>>    private final TrackingIndexWriter _triw;
>>>    private final ReferenceManager<IndexSearcher> _rmgr;
>>>    private final ControlledRealTimeReopenThread<IndexSearcher> _reopen;
>>>    private final Analyzer _analyzer;
>>>    private AtomicLong _gen;
>>>    ...
>>> }
>>>
>>> that is set up as follows:
>>>
>>> _iw = new IndexWriter(directory, new
>>> IndexWriterConfig(Version.LUCENE_4_10_2, analyzer));
>>> _triw = new TrackingIndexWriter(_iw);
>>> _rmgr = new SearcherManager(_iw, true, null);
>>> _reopen = new ControlledRealTimeReopenThread<IndexSearcher>(_triw,_rmgr,
>>> 60.00, 0.1);
>>> _analyzer = analyzer;
>>> _gen = new AtomicLong(_triw.getGeneration());
>>> _reopen.start();
>>>
>>> First some code that fetches every doc in the index is called:
>>>
>>> _reopen.waitForGeneration(_gen.get()); // wait until the index is
>>> re-opened for the last update
>>> IndexSearcher searcher = _rmgr.acquire();
>>> try {
>>>    ... fetch all documents in index ...
>>> }
>>> finally {
>>>    _rmgr.release(searcher);
>>> }
>>>
>>> This returns all docs in the index. Later on, there is an attempt to
>>> remove some of these documents
>>> (the ones that no longer exist in the master database):
>>>
>>> long curr = _gene.get();
>>> _gen.compareAndSet(curr, _triw.deleteDocuments(termToRemove));
>>> _iw.commit();
>>>
>>> This code runs without any exceptions being thrown, but it does not seem
>>> to remove anything.
>>> If I enable logging, I see things such as:
>>>
>>> DW : anyChanges? numDocsInRam=0 deletes=false hasTickets:false
>>> pendingChangesInFullFlush: false
>>>
>>> Supposedly the printout
>>>
>>> numDocsInRam=0
>>>
>>> means that commit() has not found any documents to delete. Also, if I add
>>> some extra logging to IndexWriter.deleteDocuments() like so:
>>>
>>> public void deleteDocuments(Term... terms) throws IOException {
>>>    ensureOpen();
>>>    try {
>>>      boolean dt = docWriter.deleteTerms(terms);
>>>      System.err.printf("DELETING TERMS : %s\n", terms);
>>>      System.err.printf("DT : %s\n", dt);
>>>      if (dt) {
>>>        processEvents(true, false);
>>>      }
>>>    } catch (OutOfMemoryError oom) {
>>>      tragicEvent(oom, "deleteDocuments(Term..)");
>>>    }
>>> }
>>>
>>> I can see printouts :
>>>
>>> DT : false
>>>
>>> So, an IndexWriter is given to a ReferenceManager which is then used to
>>> create an IndexSearcher
>>> that returns a set of documents. Yet later, when an attempt is made to
>>> remove some of these
>>> documents, the IndexWriter (or rather, its docWriter), cannot find these
>>> documents. Assuming
>>> that the IndexWriter is somehow involved in the inital fetch of all
>>> documents, I am confused how
>>> the IndexWriter a short while later cannot find some of these documents
>>> that have been marked
>>> (by my application) for deletion. I am pretty sure that the Term objects
>>> that are passed into
>>> deleteDocuments() are compatible with the documents previously returned
>>> by the IndexSearcher.
>>> So have I misunderstood the role of the IndexWriter as some kind of
>>> central gateway to all documents?
>>>
>>> Andrea
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message