lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Audenaerde <rob.audenae...@gmail.com>
Subject NRT facet issue (bug?), hard to reproduce, please advise
Date Fri, 11 Apr 2014 11:49:07 GMT
Hi all,

I have a issue using the near real-time search in the taxonomy. I could
really use some advise on how to debug/proceed this issue.

The issue is as follows:

I index 100k documents, with about 40 fields each. For each field, I also
add a FacetField (issues arises both with FacetField as
FloatAssociationFacetField). Each document has a unique number field
(client_no).

When just indexing and searching afterwards, all is fine.

When searching while indexing, sometimes the number of facets associated
with a document is to high, i.e. when collecting facets there are more that
one client_no on one document, which of course should not be the case.

Before each search, I use the manager.maybeRefreshBlocking() before the
search, because I want the most-actual results.

I have a taxonomy and indexreader combined in a ReferenceManager (I created
this before the SearcherTaxonomyManager existed, but it behaves exactly the
same, similar refcount logic)

During indexing I commit every 5000 documents (not needed for the NRT
search, but needed to prevent loss in the application should shut down). I
commit as follows:

    public void commit() throws DocumentIndexException
    {
        try
        {
            synchronized ( GlobalIndexCommitAndCloseLock.LOCK )
            {
                this.taxonomyWriter.commit();
                this.luceneIndexWriter.commit();
            }
        }
        catch ( final OutOfMemoryError | IOException e )
        {
            tryCloseWritersOnOOME( this.luceneIndexWriter,
this.taxonomyWriter );
            throw new DocumentIndexException( e );
        }
    }

I use a standard IndexWriterConfig and both IndexWriter and TaxonomyWriter
are RAMDirectory().

My testcase indexes the 100k documents, while another thread is
continuously calling 'manager.maybeRefreshBlocking()'. This is enough to
sometimes cause the taxonomy to be incorrect.

The number of indexing threads does not seems to influence the issue, as it
also appears when I have only 1 indexing thread.

I know it is an index problem, because when I write in the index to file
instead of RAM and reopen it in a clean application, I see the same
behaviour.


I could really use some advise on how to debug/proceed this issue. If more
info is needed, just ask.

Thanks in advance,

-Rob

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message