Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
MIME-Version: 1.0
In-Reply-To: <CALUM-yu=y2_7XOYeLW9SEKTJuaJ+01nRkNEn4O+y3rWYhB-jnA@mail.gmail.com>
References: <CALUM-ysSmck-Stk_f=HovbeDR6QVwTh3x--zNrM1wg7ds-FsMw@mail.gmail.com>
 <CAPsWd+OxyG--=RrAzHmQf+KJNabRY-kKQoR1x0dgN80BvzCNoQ@mail.gmail.com>
 <CALUM-ys+NjNrHtk4iBpa+_xOvrDR7WSimPm9EYQMw5=g+UvMkw@mail.gmail.com>
 <CAL8PwkaGw4C1LJ-VFW=BwaVkuNA3jWW3iTnWxRhrq77=oDgmDg@mail.gmail.com> <CALUM-yu=y2_7XOYeLW9SEKTJuaJ+01nRkNEn4O+y3rWYhB-jnA@mail.gmail.com>
From: Michael McCandless <lucene@mikemccandless.com>
Date: Tue, 11 Oct 2016 19:58:08 -0400
Message-ID: <CAL8PwkYXXWHm7Y=nLL7LtO9mMFwD7DTG1N6qZGhyG-GP2aGK_w@mail.gmail.com>
Subject: Re: merge problems
To: Hans Lund <ha.lund@gmail.com>
Cc: Lucene Users <java-user@lucene.apache.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
archived-at: Tue, 11 Oct 2016 23:58:44 -0000

OK I have a small test case showing the issue!

I opened https://issues.apache.org/jira/browse/LUCENE-7491

Thanks for reporting this, Hans.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Oct 11, 2016 at 12:08 PM, Hans Lund <ha.lund@gmail.com> wrote:
> hmm you're right - when it revealed a bug in our indexing code I stopped
> wondering ;-) but now I tried to create small tests to show the behavior =
-
> until now without success. I'm pretty sure that I can reproduce it by
> re-introducing our index bug, unfortunately it occurs after some hours
> parsing and indexing wikipedia dumps - but from there I'll try simplifyin=
g a
> test reproducing the setup.
>
> The setup we use is quite forward using MMapDirectory and a NRT setup - t=
he
> only tailored functionality is our own IndexDeletionPolicy using an added
> timestamp in userdata for the index commit keeping a number of snapshots =
but
> honoring a max retention period, not that I suspect it to be the cause - =
but
> if fieldinfos from another snapshot is used in the merge that could cause
> problems
>
> Hans Lund
>
> On Tue, Oct 11, 2016 at 12:07 PM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> Hmm, that should be "OK" from Lucene's standpoint.
>>
>> I mean, it should not result in strange merge exceptions later on.
>>
>> I think there's a bug somewhere in Lucene's efforts to pretend it's
>> fully schema-less ... I'll try to reproduce this.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Oct 11, 2016 at 4:38 AM, Hans Lund <ha.lund@gmail.com> wrote:
>> > Turned out to be must much simpler - we had added a new 'dynamic' fiel=
d
>> > to
>> > a stats doc a count on articles based on identified language code.
>> > Having a
>> > set of test documents in German, English, Swedish - no one had suspect=
ed
>> > the obvious that the language detection categorized a single document =
as
>> > being Indonesian, making the stats count id:1.
>> >
>> > I realized that the debug output I added - made output of everything
>> > else
>> > that the interesting field (iterating over already added fields - not
>> > the
>> > field causing the error later on ;-)
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Oct 10, 2016 at 4:32 PM, Adrien Grand <jpountz@gmail.com> wrot=
e:
>> >
>> >> It looks like the field infos of your index went out of sync with dat=
a
>> >> stored in the files about points.
>> >>
>> >> Can you run CheckIndex on your index (potentially with the `-fast`
>> >> option
>> >> so that it only verifies checksums)? It could be that one of these tw=
o
>> >> parts of the index got corrupted.
>> >>
>> >> Since you were able to modify the way add(IndexableField) is
>> >> implemented,
>> >> I'm wondering if you are running a fork of Lucene? If yes, maybe you
>> >> did
>> >> some changes that triggered this bug?
>> >>
>> >> Otherwise is your application:
>> >>  - using IndexWriter.addIndexes?
>> >>  - customizing merging in some way, eg. by wrapping the merge readers=
?
>> >>
>> >> Le mar. 4 oct. 2016 =C3=A0 16:40, Hans Lund <ha.lund@gmail.com> a =C3=
=A9crit :
>> >>
>> >> > After upgrading to 6.2 we are having problems during merges (after
>> >> running
>> >> > for a while).
>> >> >
>> >> > When the problem occurs its always complaining about the same field=
 -
>> >> > and
>> >> > throws:
>> >> >
>> >> > java.lang.IllegalArgumentException: field=3D"id" did not index poin=
t
>> >> values
>> >> >     at
>> >> >
>> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader=
(
>> >> Lucene60PointsReader.java:126)
>> >> >     at
>> >> >
>> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.
>> >> size(Lucene60PointsReader.java:224)
>> >> >     at
>> >> >
>> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.
>> >> merge(Lucene60PointsWriter.java:169)
>> >> >     at
>> >> > org.apache.lucene.index.SegmentMerger.mergePoints(
>> >> SegmentMerger.java:173)
>> >> >     at org.apache.lucene.index.SegmentMerger.merge(
>> >> SegmentMerger.java:122)
>> >> >     at
>> >> >
>> >> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:43=
12)
>> >> >     at
>> >> > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
>> >> >
>> >> >
>> >> > To figure out where we messed up - I have added some ugly logging t=
o
>> >> > Document:
>> >> >
>> >> > public final void add(IndexableField field) {
>> >> >         if ("id".equals(field.name()) &&
>> >> >                 field.fieldType().pointDimensionCount()
>> >> >                         !=3D 0) {
>> >> >             System.err.println("Point value detected");
>> >> >             for (IndexableField i : fields) {
>> >> >                 System.err.println(i);
>> >> >             }
>> >> >         }
>> >> >         fields.add(field);
>> >> >   }
>> >> >
>> >> > In hope to intercept the document we messed up.
>> >> >
>> >> > But to my surprise toString on the suspected field just says
>> >> > (contains a
>> >> > URN):
>> >> >
>> >> > indexed,omitNorms,indexOptions=3DDOCS<id:urn:wiki:doc:YEL:57028#1-1=
>
>> >> >
>> >> > So any hints as to why field.fieldType().pointDimensionCount() !=3D=
 0
>> >> >
>> >> > and any suggestions what might cause this?
>> >> >
>> >> > Regards
>> >> > Hans Lund
>> >> >
>> >>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org