lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Lund <ha.l...@gmail.com>
Subject Re: merge problems
Date Tue, 11 Oct 2016 16:08:30 GMT
hmm you're right - when it revealed a bug in our indexing code I stopped
wondering ;-) but now I tried to create small tests to show the behavior -
until now without success. I'm pretty sure that I can reproduce it by
re-introducing our index bug, unfortunately it occurs after some hours
parsing and indexing wikipedia dumps - but from there I'll try simplifying
a test reproducing the setup.

The setup we use is quite forward using MMapDirectory and a NRT setup - the
only tailored functionality is our own IndexDeletionPolicy using an added
timestamp in userdata for the index commit keeping a number of snapshots
but honoring a max retention period, not that I suspect it to be the cause
- but if fieldinfos from another snapshot is used in the merge that could
cause problems

Hans Lund

On Tue, Oct 11, 2016 at 12:07 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hmm, that should be "OK" from Lucene's standpoint.
>
> I mean, it should not result in strange merge exceptions later on.
>
> I think there's a bug somewhere in Lucene's efforts to pretend it's
> fully schema-less ... I'll try to reproduce this.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Oct 11, 2016 at 4:38 AM, Hans Lund <ha.lund@gmail.com> wrote:
> > Turned out to be must much simpler - we had added a new 'dynamic' field
> to
> > a stats doc a count on articles based on identified language code.
> Having a
> > set of test documents in German, English, Swedish - no one had suspected
> > the obvious that the language detection categorized a single document as
> > being Indonesian, making the stats count id:1.
> >
> > I realized that the debug output I added - made output of everything else
> > that the interesting field (iterating over already added fields - not the
> > field causing the error later on ;-)
> >
> >
> >
> >
> >
> > On Mon, Oct 10, 2016 at 4:32 PM, Adrien Grand <jpountz@gmail.com> wrote:
> >
> >> It looks like the field infos of your index went out of sync with data
> >> stored in the files about points.
> >>
> >> Can you run CheckIndex on your index (potentially with the `-fast`
> option
> >> so that it only verifies checksums)? It could be that one of these two
> >> parts of the index got corrupted.
> >>
> >> Since you were able to modify the way add(IndexableField) is
> implemented,
> >> I'm wondering if you are running a fork of Lucene? If yes, maybe you did
> >> some changes that triggered this bug?
> >>
> >> Otherwise is your application:
> >>  - using IndexWriter.addIndexes?
> >>  - customizing merging in some way, eg. by wrapping the merge readers?
> >>
> >> Le mar. 4 oct. 2016 à 16:40, Hans Lund <ha.lund@gmail.com> a écrit :
> >>
> >> > After upgrading to 6.2 we are having problems during merges (after
> >> running
> >> > for a while).
> >> >
> >> > When the problem occurs its always complaining about the same field -
> and
> >> > throws:
> >> >
> >> > java.lang.IllegalArgumentException: field="id" did not index point
> >> values
> >> >     at
> >> >
> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(
> >> Lucene60PointsReader.java:126)
> >> >     at
> >> >
> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.
> >> size(Lucene60PointsReader.java:224)
> >> >     at
> >> >
> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.
> >> merge(Lucene60PointsWriter.java:169)
> >> >     at
> >> > org.apache.lucene.index.SegmentMerger.mergePoints(
> >> SegmentMerger.java:173)
> >> >     at org.apache.lucene.index.SegmentMerger.merge(
> >> SegmentMerger.java:122)
> >> >     at
> >> > org.apache.lucene.index.IndexWriter.mergeMiddle(
> IndexWriter.java:4312)
> >> >     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.
> java:3889)
> >> >
> >> >
> >> > To figure out where we messed up - I have added some ugly logging to
> >> > Document:
> >> >
> >> > public final void add(IndexableField field) {
> >> >         if ("id".equals(field.name()) &&
> >> >                 field.fieldType().pointDimensionCount()
> >> >                         != 0) {
> >> >             System.err.println("Point value detected");
> >> >             for (IndexableField i : fields) {
> >> >                 System.err.println(i);
> >> >             }
> >> >         }
> >> >         fields.add(field);
> >> >   }
> >> >
> >> > In hope to intercept the document we messed up.
> >> >
> >> > But to my surprise toString on the suspected field just says
> (contains a
> >> > URN):
> >> >
> >> > indexed,omitNorms,indexOptions=DOCS<id:urn:wiki:doc:YEL:57028#1-1>
> >> >
> >> > So any hints as to why field.fieldType().pointDimensionCount() != 0
> >> >
> >> > and any suggestions what might cause this?
> >> >
> >> > Regards
> >> > Hans Lund
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message