lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jose Carlos Canova <jose.carlos.can...@gmail.com>
Subject Re: NRT facet issue (bug?), hard to reproduce, please advise
Date Sat, 12 Apr 2014 14:15:57 GMT
One thing that maybe affect and usually i forget is that if your object has
a unique identifier (client_no) such identifier must be present on the
override of "equals" methods and be part of the generation of the hashCode,
otherwise if you store this object in a collection and different routines
access/updates such collection you will have unpredictable results.


On Fri, Apr 11, 2014 at 10:59 AM, Shai Erera <serera@gmail.com> wrote:

> Hi
>
> I am not sure how more than one client_no field ends up w/ a document, and
> I'm not sure it's related to the taxonomy at all.
>
> However, looking at the code example you pasted above, and since you
> mention that you index+commit in one thread, while another thread does the
> reopen, I wonder if that's the issue: you first commit the taxo, then
> commit the index. But what if a new document makes it into the index after
> you committed to taxo, with a new client_no? In that case, the reopening
> thread will discover an "older" taxonomy, while the index will have
> categories with ordinals larger than the taxonomy's greatest ordinal?
>
> I also think that it's a mistake to commit and reopen in two separate
> threads. If possible, I suggest that you do that always in the same thread,
> and in that order: first commit the index, then the taxonomy. That way, if
> a document goes in to the index (and new facets to the taxonomy) after the
> index.commit(), then when you reopen the worse case is that the taxonomy is
> "ahead" of the index, which is fine. When you reopen, also reopen in the
> same order.
>
> Could you try that and see if that resolves your issue. Although, I don't
> understand how this can lead to more than one client_no ending up in one
> document, unless there's also a concurrency bug in the indexing code ... or
> I misunderstood the issue.
>
> Shai
>
>
> On Fri, Apr 11, 2014 at 2:49 PM, Rob Audenaerde <rob.audenaerde@gmail.com
> >wrote:
>
> > Hi all,
> >
> > I have a issue using the near real-time search in the taxonomy. I could
> > really use some advise on how to debug/proceed this issue.
> >
> > The issue is as follows:
> >
> > I index 100k documents, with about 40 fields each. For each field, I also
> > add a FacetField (issues arises both with FacetField as
> > FloatAssociationFacetField). Each document has a unique number field
> > (client_no).
> >
> > When just indexing and searching afterwards, all is fine.
> >
> > When searching while indexing, sometimes the number of facets associated
> > with a document is to high, i.e. when collecting facets there are more
> that
> > one client_no on one document, which of course should not be the case.
> >
> > Before each search, I use the manager.maybeRefreshBlocking() before the
> > search, because I want the most-actual results.
> >
> > I have a taxonomy and indexreader combined in a ReferenceManager (I
> created
> > this before the SearcherTaxonomyManager existed, but it behaves exactly
> the
> > same, similar refcount logic)
> >
> > During indexing I commit every 5000 documents (not needed for the NRT
> > search, but needed to prevent loss in the application should shut down).
> I
> > commit as follows:
> >
> >     public void commit() throws DocumentIndexException
> >     {
> >         try
> >         {
> >             synchronized ( GlobalIndexCommitAndCloseLock.LOCK )
> >             {
> >                 this.taxonomyWriter.commit();
> >                 this.luceneIndexWriter.commit();
> >             }
> >         }
> >         catch ( final OutOfMemoryError | IOException e )
> >         {
> >             tryCloseWritersOnOOME( this.luceneIndexWriter,
> > this.taxonomyWriter );
> >             throw new DocumentIndexException( e );
> >         }
> >     }
> >
> > I use a standard IndexWriterConfig and both IndexWriter and
> TaxonomyWriter
> > are RAMDirectory().
> >
> > My testcase indexes the 100k documents, while another thread is
> > continuously calling 'manager.maybeRefreshBlocking()'. This is enough to
> > sometimes cause the taxonomy to be incorrect.
> >
> > The number of indexing threads does not seems to influence the issue, as
> it
> > also appears when I have only 1 indexing thread.
> >
> > I know it is an index problem, because when I write in the index to file
> > instead of RAM and reopen it in a clean application, I see the same
> > behaviour.
> >
> >
> > I could really use some advise on how to debug/proceed this issue. If
> more
> > info is needed, just ask.
> >
> > Thanks in advance,
> >
> > -Rob
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message