lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: updateDocument (somtimes) no longer deleting documents after Update to 4.6
Date Mon, 24 Feb 2014 11:10:55 GMT
The 30 second turnaround time in 3.6.x is absurd; if you turn on
IndexWriter's infoStream maybe it'd give a clue.  Or, capture a few
stack traces and post them.

How are you creating the luceneDocumentToIndex?  You must ensure that
the business ID is in fact indexed as a field in the document,
otherwise the update won't find it.


Mike McCandless

http://blog.mikemccandless.com


On Mon, Feb 24, 2014 at 5:33 AM,  <nospam@kaigrabfelder.de> wrote:
> Hi there,
>
> we recently updated our application from lucene 3.0 to 3.6 with the effect
> that (albeit using the SearchManager functionality as described on
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html)
> calls to searcherManager.maybeRefresh() were incredibly slow. e.g. taking
> about 30 seconds after adding one document to the index with an index of
> about 9000 documents. I assumed that we did something wrong with the
> configuration as 30 seconds could not be meant with NRT ;-)
>
> Thus we migrated to the latest 4.6 version and indexing speed was indeed
> very good now (with the searcherManager.maybeRefreshBlocking() call only
> taking milliseconds to complete). But after some wore testing we discovered
> that somehow the indexWriter.updateDocument( term, documentToIndex )
> functionality wasn't working anymore as expected - at least somtetimes. It
> looks like either the updateDocument method does not longer reliably delete
> the old document before adding a new one - with the result that older
> documents are beeing returned by searches breaking our application.
>
> Unfortunately I'm not able to reproduce the issues in a simple unit test but
> maybe somebody of the lucene experts knows what we are doing wrong here. Not
> sure if it is of any relevance but we are running on Windows with a 64 bit
> JDK 7 thus MMapDirectory is beeing used.
>
> Our Index Writer is configured like this:
>
>         IndexWriterConfig conf = new IndexWriterConfig( Version.LUCENE_46,
> new LimitTokenCountAnalyzer( new DefaultAnalyzer(), Integer.MAX_VALUE ) );
>
>
>         conf.setOpenMode( OpenMode.APPEND );
>
>         IndexWriter indexWriter = new IndexWriter( FSDirectory.open( new
> File( directoryPath )), conf );
>
> SearcherManager is configured like this:
>
>         searcherManager = new SearcherManager(indexWriter, true, null);
>
> // The anlyzer that we are using looks like this:
>
>         public class DefaultAnalyzer extends Analyzer
>         {
>            @Override
>            protected TokenStreamComponents createComponents(final String
> fieldName,
>                    final Reader reader) {
>                  return new TokenStreamComponents(new
> WhitespaceTokenizer(LuceneSearchService.LUCENE_VERSION, reader));
>            }
>         }
>
> The update of the index looks like this:
>
>         // instead of 42 the unique business identifier is used
>         Long myUniqueBusinessId = 42l;
>         BytesRef ref = new BytesRef(NumericUtils.BUF_SIZE_LONG);
>         NumericUtils.longToPrefixCoded( myUniqueBusinessId.longValue(), 0,
> ref );
>         Term term = new Term( "MY_UNIQUE_BUSINESS_ID", ref );
>
>         // this method may be called multiple times with the same term and
> luceneDocumentToIndex parameter
>         indexWriter.updateDocument( term, luceneDocumentToIndex);
>
>         // After performing a couple of updates we execute
>         searcherManager.maybeRefreshBlocking();
>
>
> // For searching we are using the following code
>         searcher = searcherManager.acquire();
>         // luceneQuery is the query, filter is some sort of filtering that
> we apply, luceneSort is some sorting query
>         TopDocs topDocs = searcher.search( luceneQuery, filter, 1000,
> luceneSort );
>
> // If we perform a query for MY_UNIQUE_BUSINESS_ID it will return multiple
> results instead of just one - this was neither the case with lucene 3.0 nor
> 3.6
>
>
> In order to fix the issue I tried couple of things but to now avail. It
> still happens (not all the time though) that the lucene returns two
> documents when querying for MY_UNIQUE_BUSINESS_ID instead of just one
> -       setting setMaxBufferedDeleteTerms to 1 in the config
>         conf.setMaxBufferedDeleteTerms( 1 );
> - explicetly deleting instead of just updating
>         indexWriter.deleteDocuments( term );
> - ensuring that the field MY_UNIQUE_BUSINESS_ID is stored in the index and
> not just analysed
> - trying to delete the document via indexWriter.tryDeleteDocument()
> - calling indexWriter.maybeMerge() after the update
> - calling indexWriter.commit() after the update
>
>
> Sorry for the lenghty post but I wanted to include as much information as
> possible. Let me know if something is missing...
>
> Thanks for helping in advance ;-)
>
> Kai
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message